researcharticle revision: variance inflation in regressionpeople.virginia.edu/~der/pdf/vif.pdf ·...
TRANSCRIPT
Hindawi Publishing CorporationAdvances in Decision SciencesVolume 2013 Article ID 671204 15 pageshttpdxdoiorg1011552013671204
Research ArticleRevision Variance Inflation in Regression
D R Jensen1 and D E Ramirez2
1 Department of Statistics Virginia Tech Blacksburg VA 24061 USA2Department of Mathematics University of Virginia PO Box 400137 Charlottesville VA 22904 USA
Correspondence should be addressed to D E Ramirez dervirginiaedu
Received 12 October 2012 Accepted 10 December 2012
Academic Editor Khosrow Moshirvaziri
Copyright copy 2013 D R Jensen and D E Ramirez This is an open access article distributed under the Creative CommonsAttribution License which permits unrestricted use distribution and reproduction in any medium provided the original work isproperly cited
Variance Inflation Factors (VIFs) are reexamined as conditioning diagnostics for models with intercept with and without centeringregressors to their means as oft debated Conventional VIFs both centered and uncentered are flawed To rectify matters two typesof orthogonality are noted vector-space orthogonality and uncorrelated centered regressorsThe key to our approach lies in feasibleReference models encoding orthogonalities of these types For models with intercept it is found that (i) uncentered VIFs are notratios of variances as claimed owing to infeasible Reference models (ii) instead they supply informative angles between subspacesof regressors (iii) centered VIFs are incomplete if not misleading masking collinearity of regressors with the intercept and (iv)variance deflation may occur where ill-conditioned data yield smaller variances than their orthogonal surrogates ConventionalVIFs have all regressors linked or none often untenable in practice Beyond these our models enable the unlinking of regressorsthat can be unlinked while preserving dependence among those intrinsically linked Moreover known collinearity indices areextended to encompass angles between subspaces of regressors To reaccess ill-conditioned data we consider case studies rangingfrom elementary examples to data from the literature
1 Introduction
Given Y = X120573+120598 of full rank with zero-mean uncorrelatedand homoscedastic errors the 119901 equations X1015840X120573 = X1015840Yyield the OLS estimators for 120573 = [120573
1 120573
119901]1015840 as unbiased
with dispersion matrix 119881() = 1205902V and V= [119907
119894119895] = (X1015840X)
minus1Ill-conditioning as near dependent columns of X exertsprofound and interlinked consequences causing ldquocrucialelements of X1015840X to be large and unstablerdquo ldquocreating inflatedvariancesrdquo estimates excessive in magnitude irregular insign and ldquovery sensitive to small changes inXrdquo and unstablealgorithms having ldquodegraded numerical accuracyrdquo See [1ndash3]for example
Ill-conditioning diagnostics include the condition number1198881(X1015840X) the ratio of its largest to smallest eigenvalues and
the Variance Inflation Factors VIF(120573119895) = 119906119895119895119907119895119895 1 le 119895 le 119901
with U = X1015840X that is ratios of actual (119907119895119895) to ldquoidealrdquo (1119906
119895119895)
variances had the columns ofX been orthogonal On scalingthe latter to unit lengths and ordering VIF(120573
119895) = 119907
119895119895 1 le
119895 le 119901 as 1198811ge 1198812ge sdot sdot sdot ge 119881
119901 1198811is identified in [4] as
ldquothe best single measure of the conditioning of the datardquo Inaddition the bounds 119881
1le 1198881(X1015840X) le 119901(119881
1+ sdot sdot sdot + 119881
119901) of [5]
apply also in stepwise regression as in [6ndash9]Users deserve to be apprised not only that data are
ill conditioned but also about workings of the diagnosticsthemselves Accordingly we undertake here to rectify long-standing misconceptions in the use and properties of VIFsnecessarily retracing through several decades to their begin-nings
To choose a model with or without intercept is sub-stantive is specific to each experimental paradigm and isbeyond the scope of the present study Whatever the choicefixed in advance in a particular setting these models followon specializing 119901 = 119896 + 1 then 119901 = 119896 respectively as1198720 Y = X
0120579 + 120598 with intercept and M
119908 Y = X120573 + 120598
without where X0= [1119899X] X = [X
1X2 X
119896] comprise
vectors of regressors and 1205791015840
= [12057301205731015840
] with 1205730as inter-
cept and 1205731015840 = [1205731 120573
119896] as slopes The VIFs as defined
are essentially undisputed for models in 119872w as noted in[10] serving to gage effects of nonorthogonal regressors
2 Advances in Decision Sciences
as ratios of variances In contrast a yet unresolved debatesurrounds the choice of conditioning diagnostics for mod-els in 119872
0 namely between uncentered regressors giving
VIF119906s for [120573
0 1205731 120573
119896] and regressors X
119894centered to their
means giving VIF119888s for [120573
1 120573
119896] Specifically VIF
119906(120573119895) =
119906119895119895119907119895119895 119895 = 0 1 119896 on taking U = X1015840
0X0and V =
(X10158400X0)minus1 In contrast letting R = [119877
119894119895] be the correlation
matrix for = [1205731 120573
119896]1015840
and Rminus1 = [119877119894119895] its inverse then
centered versions are VIF119888(120573119895) = 119877119895119895 1 le 119895 le 119896 for slopes
onlyIt is seen here that (i) these differ profoundly in regard
to their respective concepts of orthogonality (ii) objectivesand meanings differ accordingly (iii) sharp divisions trace tomuddling these concepts and (iv) this distinction assumesa pivotal role here Over time a large body of widely heldbeliefs conjectures intrinsic propositions and conventionalwisdom has accumulated much from flawed logic some tobe dispelled here The key to our studies is that VIFs tobe meaningful must compare actual variances to those ofan ldquoidealrdquo second-moment matrix as reference the latter toembody the conjectured type of orthogonality This differsbetween centered and uncentered diagnostics and for bothtypes requires the reference matrix to be feasible An outlinefollows
Our undertaking consists essentially of four partsThe first is a literature survey of some essentials of ill-conditioning to include the divide between centered anduncentered diagnostics and conventional VIFs The struc-ture of orthogonality is examined next Anomalies inusage meaning and interpretation of conventional VIFs areexposed analytically and through elementary and transparentcase studies Long-standing but ostensible foundations inturn are reassessed and corrected through the constructionof ldquoReference modelsrdquo These are moment arrays constrainedto encode orthogonalities of the types considered Neitherarray returns the conventional VIF
119906s nor VIF
119888s Direct
rules for finding the amended Reference models are givenpreempting the need for constrained numerical algorithmsFinally studies of ill-conditioned data from the literature arereexamined in light of these findings
2 Preliminaries
21 Notation Designate byR119901 the Euclidean 119901-space Matri-ces and vectors are set in bold type the transpose and inverseof A are A1015840 and Aminus1 and A(119894119895) refers on occasion to the(119894 119895) element of A Special arrays are the identity I
119901 the unit
vector 1119901= [1 1 1]
1015840isin R119901 the diagonal matrix D
119886=
Diag(1198861 119886
119901) and B
119899= (I119899minus 119899minus1111989911015840119899) as idempotent of
rank 119899 minus 1 The Frobenius norm of x isin R119901 is x = (x1015840x)12ForX of order (119899times119901) and rank 119901 le 119899 a generalized inverse isdesignated as Xdagger its ordered singular values as 120590(X) = 120585
1ge
1205852
ge sdot sdot sdot ge 120585119901
gt 0 and by S119901(X) sub R119899 the subspace of
R119899 spanned by the columns of X By accepted conventionits condition number is 119888
2(X) = 120585
1120585119901 specifically 119888
2(X) =
[1198881(X1015840X)]
12 For our model Y = X120573 + 120598 with dispersion
119881(120598)=1205902I119899 we take 120590
2= 10 unless stated otherwise since
variance ratios are scale invariant
22 Case Study 1 A First Look That anomalies pervadeconventional VIF
119906s and VIF
119888s may be seen as follows Given
1198720 119884119894= 1205730+12057311198831+12057321198832+120598119894 the designX
0= [15X1X2]
of order (5 times 3)U=X10158400X0 and its inverse V=(X1015840
0X0)minus1 as in
X10158400= [
[
1 1 1 1 1
0 05 05 1 1
minus1 1 1 0 0
]
]
U = [
[
5 3 1
3 25 1
1 1 3
]
]
V = [
[
072222 minus088889 005556
minus088889 155556 minus022222
005556 minus022222 038889
]
]
(1)
Conventional centered and uncentered VIFs are
[VIF119888(1205731) VIF
119888(1205732)] = [10889 10889] (2)
[VIF119906(1205730)VIF
119906(1205731)VIF
119906(1205732)]= [36111 38889 11667]
(3)
respectively the former for slopes only and the latter takingreciprocals of the diagonals of X1015840
0X0as reference
A Critique The following points are basic Note first thatmodel (1) is nonorthogonal in both the centered and uncen-tered regressors
Remark 1 The VIF119906s are not ratios of variances and thus fail
to gage relative increases in variances owing to nonorthog-onal columns of X
0 This follows since the first row and
column of the second-moment matrix X10158400X0= U are fixed
and nonzero by design so that taking X10158400X0to be diagonal as
reference cannot be feasible
Remark 1 runs contrary to assertions throughout the liter-ature In consequence for models in 119872
0the mainstay VIF
119906s
in recent vogue are largely devoid of meaning Subsequentlythese are identified instead with angles quantifying degrees ofmulticollinearity among the regressors
On the other hand feasible Reference models for allparameters as identified later for centered and uncentereddata in Definition 13 Section 42 give
[VF119888(1205730)VF119888(1205731)VF119888(1205732)]= [09913 10889 10889]
(4)
[VF119907(1205730)VF119907(1205731)VF119907(1205732)]= [07704 08889 08889]
(5)
in lieu of conventional VIF119888s and VIF
119906s respectively The
former comprise corrected VIF119888s extended to include the
intercept Both sets in fact are genuine variance inflationfactors as ratios of variances in themodel (1) relative to thosein Reference models feasible for centered and for uncenteredregressors respectively
This example flagrantly contravenes conventional wis-dom (i) variances for slopes are inflated in (4) but for the
Advances in Decision Sciences 3
intercept deflated in comparison with the feasible centeredreference Specifically 120573
0is estimated here with greater
efficiency VF119888(1205730) = 09913 in the initial design (1) despite
nonorthogonality of its centered regressors (ii) Variances areuniformly smaller in the model (1) than in its feasible uncen-tered reference from (5) thus exhibiting Variance Deflationdespite nonorthogonality of the uncentered regressors Afull explication of the anatomy of this study appears laterIn support we next examine technical details needed insubsequent developments
23 Types of Orthogonality The ongoing divergence betweencentered and uncentered diagnostics traces in part to mean-ing ascribed to orthogonality Specifically the orthogonalityof columns of X in 119872
119908refers unambiguously to the vector-
space concept X119894perp X119895 that is X1015840
119894X119895
= 0 as does thenotion of collinearity of regressors with the constant vectorin 1198720 We refer to this as 119881-orthogonality in short 119881perp In
contrast nonorthogonality in 1198720often refers instead to the
statistical concept of correlation among columns of X whenscaled and centered to their means We refer to its negationas 119862-orthogonality or 119862
perp Distinguishing between thesenotions is fundamental as confusion otherwise is evident Forexample it is asserted in [11 p125] that ldquothe simple correlationcoefficient 119903
119894119895does measure linear dependency between 119909
119894
and 119909119895in the datardquo
24 The Models 1198720 Consider 119872
0 Y = X
0120579 + 120598 X
0=
[1119899X] 1205791015840 = [120573
01205731015840] together with
X10158400X0= [
119899 119899X1015840
119899X X1015840X] (X1015840
0X0)minus1
= [11988800
c01
c101584001
G ] (6)
where 119899X1015840 = 11015840119899X 11988800
= [119899 minus 1198992X1015840(X1015840X)
minus1X]minus1 C =
(X1015840Xminus119899XX1015840) =X1015840B119899X andG = Cminus1 from block-partitioned
inverses In later sections we often will denote the mean-centeringmatrix 119899XX1015840 byM In particular the centered formX rarr Z = B
119899X arises exclusively in models with intercept
with or without reparametrizing (1205730120573) rarr (120572120573) in the
mean-centered regressors where 120572 = 1205730+12057311198831+ sdot sdot sdot + 120573
119896119883119896
Scaling Z to unit column lengths gives Z1015840Z in correlationform with unit diagonals
3 Historical Perspective
Our objective is an overdue revision of the tenets of varianceinflation in regression To provide context we next surveyextenuating issues from the literature Direct quotations areintended not to subvert stances taken by the cited authorsModels in 119872
0are at issue since as noted in [10] centered
diagnostics have no place in119872119908
31 Background Aspects of ill-conditioning span a consid-erable literature over many decades Regarding Y = X120573 +120598 scaling columns of X to equal lengths approximatelyminimizes the condition number 119888
2(X) [12 p120] based on
[13] Nonetheless 1198882(X) is cast in [9] as a blunt instrument for
ill-conditioning prompting the need for VIFs and other localconstructs Stewart [9] credits VIFs in concept to Daniel andin name to Marquardt
Ill-conditioning points beyond OLS in view of difficultiescited earlier Remedies proffered in [14 15] include trans-forming variables adding new data and deleting variable(s)after checking critical features of the reduced model Otherintended palliatives include Ridge and Partial Least Squaresas compared in [16] Principal Components regression Sur-rogate models as in [17] All are intended to return reducedstandard errors at the expense of bias Moreover Surrogatesolutions more closely resemble those from an orthogonalsystem than Ridge [18] Together the foregoing and otheroptions comprise a substantial literature as collateral to butapart from the present study
32 To Center Advocacy for centering includes the follow-ing
(i) VIFs often are defined categorically as the diagonalsof the inverse of the correlation matrix of scaled andcentered regressors see [4 9 11 19] for exampleThese are VIF
119888s widely adopted without justification
as default to the exclusion of VIF119906s
(ii) It is asserted [4] that ldquocentering removes the nones-sential ill-conditioning thus reducing the varianceinflation in the coefficient estimatesrdquo
(iii) Centering is advocated when predictor variables arefar removed from origins on the basic data scales [1011]
33 Not to Center Advocacy for uncentered diagnosticsincludes the following caveats from proponents of centering
(i) Uncentered data should be examined only if an esti-mate of the intercept is of interest [9 10 20]
(ii) ldquoIf the domain of prediction includes the full rangefrom the natural origin through the range of data thecollinearity diagnostics should not bemean centeredrdquo[10 p84]
Other issues against centering derive in part from numer-ical analysis and work by Belsley
(i) Belsley [1] identifies 1198882(X) for a system Y = X120573+120598 as
ldquothe potential relative change in the LS solution b thatcan result from a small relative change in the datardquo
(ii) These require structurally interpretable variables asldquoones whose numerical values and (relative) differ-ences derive meaning and interpretability fromknowledge of the structure of the underlying lsquorealityrsquobeing modeledrdquo [1 p75]
(iii) ldquoThere is no such thing as lsquononessentialrsquo ill-condi-tioningrdquo and ldquomean-centering can remove from thedata the information needed to assess conditioningcorrectlyrdquo [1 p74]
(iv) ldquoCollinearity with the intercept can quite generallycorrupt the estimates of all parameters in the model
4 Advances in Decision Sciences
whether or not the intercept is itself of interest andwhether or not the data have been centeredrdquo [21 p90]
(v) An example [22 p121] gives severely ill-conditioneddata perfectly conditioned in centered form ldquocenter-ing alters neitherrdquo inflated variances nor extremelysensitive parameter estimates in the basic data more-over ldquodiagnosing the conditioning of the centereddata (which are perfectly conditioned) would com-pletely overlook this situation whereas diagnosticsbased on the basic data would notrdquo
(vi) To continue from [22] ill-conditioning persists inthe propagation of disturbances in that ldquoa 1 percentrelative change in theX
119894rsquos results in over a 40 relative
change in the estimatesrdquo despite perfect conditioningin centered form and ldquoknowledge of the effect ofsmall relative changes in the centered data is notmeaningful for assessing the sensitivity of the basic LSproblemrdquo since relative changes and their meaningsin centered anduncentered data oftendiffermarkedly
(vii) Regarding choice of origin ldquo(1) the investigator mustbe able to pick an origin against which small relativechanges can be appropriately assessed and (2) it is thedata measured relative to this origin that are relevantto diagnosing the conditioning of the LS problemrdquo[22 p126]
Other desiderata pertain
(i) ldquoBecause rewriting the model (in standardized vari-ables) does not affect any of the implicit estimates ithas no effect on the amount of information containedin the datardquo [23 p76]
(ii) Consequences of ill-advised diagnostics can besevere Degraded numerical accuracy traces to nearcollinearity of regressors with the constant vectorIn short centering fails to prevent a loss in numer-ical accuracy centered diagnostics are unable todiscern these potential accuracy problems whereasuncentered diagnostics are seen to work well Twowidely used statistical packages SAS and SPSS-Xfail to detect this type of ill-conditioning throughuse of centered diagnostics and thus return highlyinaccurate coefficient estimates For further detailssee [3]
On balance formodels in1198720the jury is out regarding the
use of centered or uncentered diagnostics to include VIFsEven Belsley [1] (and elsewhere) concedes circumstanceswhere centering does achieve structurally interpretable mod-els Of note is that the foregoing listed citations to Belsleyapply strictly to condition numbers 119888
2(X) other purveyors of
ill-conditioning specifically VIFs are not treated explicitly
34 A Synthesis It bears notice that (i) the origin how-ever remote from the cluster of regressors is essential forprediction and (ii) the prediction variance is invariant toparametrizing in centered or uncentered forms Additionalremarks are codified next for subsequent referral
Remark 2 Typically 119884 represents response to input vari-ables 119883
1 1198832 119883
119896 In a controlled experiment levels
are determined beforehand by subject-matter considerationsextraneous to the experiment to include minimal valuesHowever remote the origin on the basic data scales it seemsinformative in such circumstances to identify the originwith these minima In such cases the intercept is often ofsingular interest since 120573
0is then the standard against which
changes in 119884 are to be gaged as regressors vary We adopt thisconvention in subsequent studies from the archival literature
Remark 3 In summary the divergence in views whetherto center or not appears to be that critical aspects of ill-conditioning known and widely accepted for models in119872
119908
have been expropriated over decades mindlessly and withoutverification to apply point-by-point for models in119872
0
4 The Structure of Orthogonality
This section develops the foundations for Reference modelscapturing orthogonalities of types 119881
perp and 119862perp Essential
collateral results are given in support as well
41 Collinearity Indices Stewart [9] reexamines ill-condi-tioning from the perspective of numerical analysis Detailsfollow where X is a generic matrix of regressors havingcolumns x
119895 1 le 119895 le 119901 and Xdagger = (X1015840X)
minus1X1015840 is thegeneralized inverse of note having xdagger
119895 1 le 119895 le 119901 as
its typical rows Each corresponding collinearity index 120581 isdefined in [9 p72] as
120581119895=10038171003817100381710038171003817x119895
10038171003817100381710038171003817sdot10038171003817100381710038171003817xdagger119895
10038171003817100381710038171003817 1 le 119895 le 119901 (7)
constructed so as to be scale invariant Observe that xdagger1198952 is
found along the principal diagonal of [(Xdagger)(Xdagger)1015840] = (X1015840X)minus1
WhenX(119899times119896) inX0= [1119899X] is centered and scaled Section
3 of [9] shows that the centered collinearity indices 120581119888satisfy
1205812
119888119895= VIF
119888(120573119895) 1 le 119895 le 119896 In 119872
0with parameters (120573
0
120573) values corresponding to xdagger1198952 from Xdagger
0= (X10158400X0)minus1X10158400
lie along the principal diagonal of (X10158400X0)minus1 the uncentered
collinearity indices 120581119906now satisfy 120581
2
119906119895= x1198952sdot xdagger1198952
119895 =
0 1 119896 In particular since x0
= 1119899 we have 120581
2
1199060=
119899xdagger02 Moreover in119872
0it follows that the uncentered VIF
119906s
are squares of the collinearity indices that is VIF119906(120573119895) =
1205812
119906119895 119895 = 0 1 119896 Note the asymmetry that VIF
119888s exclude
the intercept in contrast to the inclusive VIF119906s That the
label Variance Inflation Factors for the latter is a misnomer iscovered in Remark 1 Nonetheless we continue the familiarnotation VIF
119906(120573119895) = 1205812
119906119895 119895 = 0 1 119896
Transcending the essential developments of [9] are con-nections between collinearity indices and angles between
Advances in Decision Sciences 5
subspaces To these ends choose a typical x119895in X0 and
rearrange X0as [x119895X[119895]] We next seek elements of
Q1015840119895(X10158400X0)minus1
Q119895= [
x1015840119895x119895
x1015840119895X[119895]
X1015840[119895]x119895X1015840[119895]X[119895]
]
minus1
(8)
as reordered by the permutation matrix Q119895 From the clock-
wise rule the (1 1) element of the inverse is
[x1015840119895(I119899minus P119895) x119895]minus1
= [x1015840119895x119895minus x1015840119895P119895x119895]minus1
=10038171003817100381710038171003817xdagger119895
10038171003817100381710038171003817
2 (9)
in succession for each 119895 = 0 1 119896 where P119895
=
X[119895][X1015840[119895]X[119895]]minus1X1015840[119895]
is the projection operator onto the sub-space S
119901(X[119895]) sub R119899 spanned by the columns of X
[119895] These
in turn enable us to connect 1205812119906119895 119895 = 0 1 119896 and similarly
for centered values 1205812119888119895 119895 = 1 119896 to the geometry of ill-
conditioning as follows
Theorem 4 For models in 1198720let VIF
119906(120573119895) = 120581
2
119906119895 119895 = 0
1 119896 be conventional uncentered VIF119906s in terms of Stew-
artrsquos [9] uncentered collinearity indices These in turn quantifythe extent of collinearities between subspaces through angles (in119889119890119892 as follows
(i) Angles between (x119895S119901(X[119895])) are given by 120579
119895=
arccos[(1minus11205812uj)12
] in succession for 119895 = 0 1 119896
(ii) In particular 1205790
= arccos[(1 minus 11205812u0)12
] quantifiesthe degree of collinearity between the regressors and theconstant vector
(iii) Similarly let B119899X = Z = [z
1 z
119896] be regressors
centered to their means rearrange as [z119895Z[119895]] and let
VIF119888(120573119895) = 120581
2
119888119895 119895 = 1 119896 be centered VIF
119888s in
terms of Stewartrsquos centered collinearity indices Thenangles (in deg) between (z
119895S119901(Z[119895])) are given by
120579119895= arccos[(1 minus 11205812cj)
12] 1 le j le k
Proof From the geometry of the right triangle formed by(x119895P119895x119895) the squared lengths satisfy x
1198952
= P119895x1198952+
x119895minus P119895x1198952 where x
119895minus P119895x1198952
= RS119895is the residual sum
of squares from the projection It follows that the principalangle between (x
119895P119895x119895) is given by
cos (120579119895)=
x1015840119895P119895x119895
10038171003817100381710038171003817x119895
10038171003817100381710038171003817sdot10038171003817100381710038171003817P119895x119895
10038171003817100381710038171003817
=
10038171003817100381710038171003817P119895x119895
1003817100381710038171003817100381710038171003817100381710038171003817x119895
10038171003817100381710038171003817
=radic1 minusRS119895
10038171003817100381710038171003817x119895
10038171003817100381710038171003817
2=radic1 minus
1
1205812119906119895
(10)
for 119895 = 0 1 119896 to give conclusion (i) Conclusion(ii) follows on specializing (x
0P0x0) with x
0= 1119899and
P0= X(X1015840X)
minus1X1015840 Conclusion (iii) follows similarly from thegeometry of the right triangle formed by (z
119895P119895z119895) where P
119895
now is the projection operator onto the subspace S119901(Z[119895]) sub
R119899 spanned by the columns of Z[119895] to complete our proof
Remark 5 Rules of thumb in common use for problematicVIFs include those exceeding 10 or even 4 see [11 24] forexample In angular measure these correspond respectivelyto 120579119895lt 18435 deg and 120579
119895lt 300 deg
42 Reference Models We seek as Reference feasible modelsencoding orthogonalities of types 119881
perp and 119862perp The keys
are as follows (i) to retain essentials of the experimentalstructure and (ii) to alter what may be changed to achieveorthogonality For a model in119872
119908with moment matrix X1015840X
our opening paragraph prescribes as reference the modelR119881
= D = Diag(11988911 119889
119901119901) as diagonal elements of
X1015840X for assessing 119881perp-orthogonality Moreover on scaling
columns of X to equal lengths R119881is perfectly conditioned
with 1198881(R119881) = 10 In addition every model in 119872
119908clearly
conforms with its Reference in the sense that R119881is positive
definite as distinct from models in1198720to follow
Consider again models in 1198720as in (6) let C = (X1015840X minus
M) withM = 119899XX1015840 as the mean-centering matrix and againletD = Diag(119889
11 119889
119896119896) comprise the diagonal elements of
X1015840X
(i) 119881perpReference ModelThe uncentered VIF119906s in119872
0 defined
as ratios of diagonal elements of (X10158400X0)minus1 to reciprocals of
diagonal elements of X10158400X0 appear to have seen exclusive
usage apparently in keepingwithRemark 3However the fol-lowing disclaimermust be registered as the formal equivalentof Remark 1
Theorem 6 Statements that conventional VIF119906s quantify var-
iance inflation owing to nondiagonalX10158400X0are false for models
in1198720having X = 0
Proof Since the Reference variances are reciprocals of diag-onal elements of X1015840
0X0 this usage is predicated on the false
postulate that X10158400X0can be diagonal for X = 0 Specifically
[1119899X] are linearly independent that is 11015840
119899X = 0 if and only
if X has been mean centered beforehand
To the contrary Gunst [25] purports to show thatVIF119906(1205730) registers genuine variance inflation namely the
price to be paid in variance for designing an experimenthaving X = 0 as opposed to X = 0 Since variances forintercepts are Var(sdot | X = 0) = 120590
2119899 and Var(sdot | X = 0) =
120590211988800
ge 1205902119899 from (6) their ratio is shown in [25] to be
1205812
1199060= 11989911988800
ge 1 in the parlance of Section 23 We concede thisto be a ratio of variances but to the contrary not a VIF sincethe parameters differ In particular Var(120573
0| X = 0) = 120590
211988800
whereas Var( | X = 0) = 1205902119899 with 120572 = 120573
0+ 12057311198831+
sdot sdot sdot + 120573119896119883119896in centered regressors Nonetheless we still find
the conventional VIF119906(1205730) to be useful for purposes to follow
Remark 7 Section 3 highlights continuing concern in regardto collinearity of regressors with the constant vectorTheorem 4(ii) and expression (10) support the use of 120579
0=
arccos[(1 minus 11205812u0)12
] as an informative gage on the extent of
6 Advances in Decision Sciences
this occurrence Specifically the smaller the angle the greaterthe extent of such collinearity
Instead of conventional VIF119906s given the foregoing dis-
claimer we have the following amended version as Referencefor uncentered diagnostics altering whatmay be changed butleaving X = 0 intact
Definition 8 Given a model in 1198720with second-moment
matrixX10158400X0 the amendedReferencemodel for assessing119881perp-
orthogonality is R119881
= [ 119899 119899X1015840
119899X D] with D = Diag(119889
11 119889
119896119896)
as diagonal elements of X1015840X provided that R119881is positive
definite We identify a model to be 119881perp-orthogonal when
X10158400X0= R119881
As anticipated a prospective R119881
fails to conform toexperimental data if not positive definite These and furtherprospects are covered in the following where 120601
119894designates
the angle between (1119899X119894) 119894 = 1 119896
Lemma 9 Take R119881
as a prospective Reference for 119881perp-
orthogonality
(i) In order that R119881maybe positive definite it is necessary
that 119899X1015840Dminus1X lt 1 that is that 119899sum119896
119894=1(1198832
119894119889119894119894) lt 1
(ii) Equivalently it is necessary thatsum119896119894=1
cos2(120601119894) lt 1with
120601119894as the angle between (1
119899Xi) 119894 = 1 119896
(iii) The Reference variance for 1205730is Var(120573
0| X1015840X = D) =
[119899(1 minus 119899X1015840Dminus1X)]minus1
(iv) The Reference variances for slopes are given by
Var (120573119894| X1015840X = D) =
1
119889119894119894
[1 +1205741198832
119894
119889119894119894
] 1 le 119894 le 119896 (11)
where 120574 = 119899(1 minus 119899X1015840Dminus1X)
Proof The clockwise rule for determinants gives |R119881| =
|D|[119899 minus 1198992X1015840Dminus1X] Conclusion (i) follows since |D| gt 0
The computation
cos (120601119894) =
11015840119899X119894
1003817100381710038171003817111989910038171003817100381710038171003817100381710038171003817X119894
1003817100381710038171003817=
11015840119899X119894
radic119899119889119894119894
= radic1198991198832
119894
119889119894119894
(12)
in parallel with (10) gives conclusion (ii) Using the clockwiserule for block-partitioned inverses the (1 1) element of Rminus1
119881
is given by conclusion (iii) Similarly the lower right block ofRminus1119881 of order (119896 times 119896) is the inverse of H = D minus 119899XX1015840 On
identifying a = b = X 120572 = 119899 and 119886lowast
119894= 119886119894119889119894119894 in Theorem
833 of [26] we have that Hminus1 = Dminus1 + 120574Dminus1XX1015840Dminus1Conclusion (iv) follows on extracting its diagonal elementsto complete our proof
Corollary 10 For the case 119896 = 2 in order that R119881maybe
positive definite it is necessary that 1206011+ 1206012gt 90 deg
Proof Beginning with Lemma 9(ii) compute
cos (1206011+ 1206012)
= cos1206011cos1206012minus sin120601
1sin1206012
= cos1206011cos1206012minus[1minus(cos2120601
1+cos2120601
2)+cos2120601
1cos21206012]12
(13)
which is lt0 when 1 minus (cos21206011+ cos2120601
2) gt 0
Moreover the matrix R119881itself is intrinsically ill conditioned
owing to X = 0 its condition number depending on X Toquantify this dependence we have the following wherecolumns of X have been standardized to common lengths|X119894| = 119888 1 le 119894 le 119896
Lemma 11 Let R119881= [ 119899 T1015840
T 119888I119896
] as in Definition 8 with T = 119899XD = 119888I
119896 and 119888 gt 0
(i) The ordered eigenvalues of R119881are [120582
1 119888 119888 120582
119901]
where 119901 = 119896 + 1 and (1205821 120582119901) are the roots of 120582 isin
[(119899 + 119888) plusmn radic(119899 minus 119888)2+ 41205912]2 and 120591
2= T1015840T
(ii) The roots are positive andR119881is positive definite if and
only if (119899119888 minus 1205912) gt 0
(iii) If R119881
is positive definite its condition number is1198881(R119881) = 1205821120582119901and is increasing in 120591
2
Proof Eigenvalues are roots of the determinantal equation
Det [(119899 minus 120582) T1015840T (119888 minus 120582) I
119896
]
= 0 lArrrArr (119888 minus 120582)119896minus1
[(119899 minus 120582) (119888 minus 120582) minus T1015840T] = 0
(14)
from the clockwise rule giving (119896 minus 1) values 120582 = 119888 and tworoots of the quadratic equation 120582
2minus (119899 + 119888)120582 + 119899119888 minus 120591
2= 0
to give conclusion (i) Conclusion (ii) holds since the productof roots of the quadratic equation is (119899119888 minus 120591
2) and the greater
root is positive Conclusion (iii) follows directly to completeour proof
(119894119894) 119862perp Reference Model As noted the notion of 119862
perp-or-thogonality applies exclusively for models in 119872
0 Accord-
ingly as Reference we seek to alter X1015840X so that the matrixcomprising sums of squares and sums of products of devia-tions from means thus altered is diagonal To achieve thiscanon of 119862
perp-orthogonality and to anticipate notation forsubsequent use we have the following
Definition 12 (i) For a model in 1198720with second-moment
matrix X10158400X0and inverse as in (6) let C
119908= (W minus M) and
identify an indicator vector G = [11989212 11989213 119892
1119896 11989223
1198922119896 119892
119896minus1119896] of order ((119896 minus 1) times (119896 minus 2)) in lexicographic
order where 119892119894119895= 0 119894 = 119895 if the (119894 119895) element of C
119908is zero
and 119892119894119895= 1 119894 = 119895 if the (119894 119895) element ofC
119908is (X1015840119894X119895minus119899119883119894119883119895)
that is unchanged from C = (X1015840X minus 119899XX1015840) = Z1015840Z withZ = B
119899X as the regressors centered to their means
Advances in Decision Sciences 7
(ii) In particular the Reference model in1198720for assessing
119862perp-orthogonality is R
119862= [ 119899 119899X
1015840
119899X W] such that C
119908= (W minus
M) and its inverse G from (6) are diagonal that is takingW = [119908
119894119895] such that 119908
119894119894= 119889119894119894 1 le 119894 le 119896 and 119908
119894119895=
119898119894119895 for all 119894 = 119895 or equivalently G = [0 0 0] In this
case we identify the model to be 119862perp-orthogonal
Recall that conventional VIF119888s for [120573
1 1205732 120573
119896] are
ratios of diagonal elements of (Z1015840Z)minus1 to reciprocals ofthe diagonal elements of the centered Z1015840Z Apparently thisrests on the logic that 119862perp-orthogonality in 119872
0implies that
Z1015840Z is diagonal However the converse fails and instead isembodied in R
119862isin 1198720 In consequence conventional VIF
119888s
are deficient in applying only to slopes whereas the VF119888s
resulting from Definition 12(ii) apply informatively for all of[1205730 1205731 1205732 120573
119896]
Definition 13 Designate VF119907(sdot) and VF
119888(sdot) as Variance Fac-
tors resulting from Reference models of Definition 8 andDefinition 12(ii) respectively On occasion these representVariance Deflation in addition to VIFs
Essential invariance properties pertain ConventionalVIF119888s are scale invariant see Section 3 of [9]We see next that
they are translation invariant as well To these ends we shiftthe columns of X=[X
1X2 X
119896] to a new origin through
X rarr X minus L = H where L = [11988811119899 11988821119899 119888
1198961119899] The
resulting model is
Y = (12057301119899+ L120573) +H120573 + 120598
lArrrArr Y = (1205730+ 120595) 1
119899+H120573 + 120598
(15)
thus preserving slopes where120595 = c1015840120573=(11988811205731+11988821205732+sdot sdot sdot+119888
119896120573119896)
Corresponding to X0= [1119899X] is U
0= [1119899H] and to X1015840
0X0
and its inverse are
U10158400U0= [
119899 11015840119899H
H10158401119899
H1015840H] (U10158400U0)minus1
= [11988700
b01
b101584001
G ] (16)
in the form of (6)This pertains to subsequent developmentsand basic invariance results emerge as follows
Lemma 14 Consider Y = 12057301119899+ X120573 + 120598 together with the
shifted version Y = (1205730+ 120595)1
119899+ H120573 + 120576 both in 119872
0 Then
the matrices G appearing in (6) and (16) are identical
Proof Rules for block-partitioned inverses again assert thatG of (16) is the inverse of
(H1015840H minus 119899minus1H10158401
11989911015840119899H) = H1015840 (I
119899minus 119899minus1111989911015840119899)H
= H1015840B119899H = X1015840B
119899X
(17)
since B119899L = 0 to complete our proof
These facts in turn support subsequent claims that cen-tered VIF
119888s are translation and scale invariant for slopes
1015840
=
[1205731 1205732 120573
119896] apart from the intercept 120573
0
43 A Critique Again we distinguish VIF119888s and VIF
119906s from
centered and uncentered regressorsThe following commentsapply
(C1) A design is either 119881perp or 119862perp-orthogonal respectivelyaccording as the lower right (119896times119896) blockX1015840X ofX1015840
0X0
orC = (X1015840Xminus119899XX1015840) from expression (6) is diagonalOrthogonalities of type 119881perp and 119862
perp are exclusive andhence work at crossed purposes
(C2) In particular 119862perp-orthogonality holds if and only ifthe columns of X are 119881
perp-nonorthogonal If X is 119862perp-orthogonal then for 119894 = 119895 C
119908(119894119895) = X1015840
119894X119895minus 119899119883119894119883119895=
0 and X1015840119894X119895
= 0 Conversely if X indeed is 119881perp-
orthogonal so that X1015840X = D= Diag(11988911 119889
119896119896)
then (D minus 119899XX1015840) cannot be diagonal as Reference inwhich case the conventional VIF
119888s are tenuous
(C3) Conventional VIF119906s based on uncentered X in 119872
0
with X = 0 do not gage variance inflation as claimedfounded on the false tenet that X1015840
0X0can be diagonal
(C4) To detect influential observations and to classify highleverage points casendashinfluence diagnostics namely119891119895119868
= VIF119895(minus119868)
VIF119895 1 le 119895 le 119896 are studied in [27]
for assessing the impact of subsets on variance infla-tion Here VIF
119895is from the full data and VIF
119895(minus119868)on
deleting observations in the index set 119868 Similarly [28]proposes using VIF
(119894)on deleting the 119894th observation
In the present context these would gain substance onmodifying VF
119907and VF
119888accordingly
5 Case Study 1 Continued
51 The Setting We continue an elementary and transparentexample to illustrate essentials Recall119872
0 119884119894= 1205730+12057311198831+
12057321198832+120598119894 of Section 22 the designX
0= [15X1X2] of order
(5 times 3) and U = X10158400X0 and its inverse V = (X10158400X0)minus1 as in
expressions (1) The design is neither 119881perp nor 119862perp orthogonalsince neither the lower right (2 times 2) block of X1015840
0X0nor the
centered Z1015840Z is diagonal Moreover the uncentered VIF119906s as
listed in (3) are not the vaunted relative increases in variancesowing to nonorthogonal columns of X
0 Indeed the only
opportunity for 119881perp-orthogonality here is between columns
[X1X2]
Nonetheless from Section 41 we utilize Theorem 4(i)and (10) to connect the collinearity indices of [9] to anglesbetween subspaces Specifically Minitab recovered valuesfor the residuals RS
119895= x
119895minus P119895x1198952 119895 = 0 1 2 fur-
ther computations proceed routinely as listed in Table 1 Inparticular the principal angle 120579
0between the constant vector
and the span of the regressor vectors is 1205790= 31751 deg as
anticipated in Remark 7
52 119881perp-Orthogonality For subsequent reference let 119863(119909) bea design as in (1) but with second-moment matrix
U (119909) = [
[
5 3 1
3 25 119909
1 119909 3
]
]
(18)
8 Advances in Decision Sciences
Table 1 Values for residuals RS119895= x119895minus P119895x1198952 for X
1198952 for 1205812
119906119895= VI F
119906(120573119895) for (1 minus 1120581
2
119906119895)12 and for 120579
119895 for 119895 = 0 1 2
119895 RS119895
X1198952
1205812
119895(1 minus 1120581
2
119906119895)12
120579119895(deg)
0 13846 50 36111 08503 317511 06429 25 38889 08619 304702 25714 30 11667 03780 67792
Invoking Definition 8 we seek a 119881perp-orthogonal Reference
with moment matrix U(0) having X10158401X2
= 0 that is 119863(0)This is found constructively on retaining [1
5 ∙X2] in X
0
but replacing X1by X01
= [1 05 05 1 0]1015840 as listed in
Table 2 giving the design 119863(0) with columns [15X01X2] as
ReferenceAccordingly R
119881= U(0) is ldquoas 119881
perp-orthogonal as it cangetrdquo given the lengths and sums of (X
1X2) as prescribed
in the experiment and as preserved in the Reference modelLemma 9 with X=[06 02]
1015840 and D= Diag(25 3) givesX1015840Dminus1X = 0157333 and 120574 = 5(1 minus 5X1015840Dminus1X) =
234375 Applications of Lemma 9(iii)-(iv) in succession givethe reference variances
Var (1205730| X1015840X = D) = [5 (1 minus 5X1015840Dminus1X)]
minus1
= 09375
Var (1205731|X1015840X=D)=
1
11988911
[1+1205741198832
1
11988911
]=2
5[1+
072120574
5]=17500
Var (1205732|X1015840X=D)=
1
11988922
[1+1205741198832
2
11988922
]=1
3[1+
004120574
3]=04375
(19)
As Reference these combine with actual variances from119881 = [119881
119894119895] at (1) giving VF
119907for the original designX
0relative
to R119881 For example VF
119907(1205730) = 0722209375 = 07704 with
11988111
= 07222 from (1) and 09375 from (19) This contrastswith VIF
119906(1205730) = 36111 in (3) as in Section 22 and Table 2
it is noteworthy that the nonorthogonal design 119863(1) yieldsuniformly smaller variances than the 119881
perp-orthogonal R119881
namely119863(0)Further versions 119863(119909) 119909 isin [minus05 00 05 06 10 15]
of our basic design are listed for comparison in Table 2 where119909 = X1015840
1X2for the various designs The designs themselves are
exhibited on transposing rowsX10158401= [11988311 11988312 11988313 11988314 11988315]
from Table 2 and substituting each into the template X00
=
[15 ∙X2] The design 119863(2) was constructed but not listed
since its X10158400X0is not invertible Clearly these are actual
designs amenable to experimental deployment
53 119862perp-Orthogonality Continuing X0
= [15X1X2] and
invoking Definition 12(ii) we seek a 119862perp-orthogonal
Reference having the matrix C119908
as diagonal This is
identified as 119863(06) in Table 2 From this the matrix R119862and
its inverse are
R119862= [
[
5 3 1
3 25 06
1 06 3
]
]
Rminus1119862
= [
[
07286 minus08572 minus00714
minus08571 14286 00
minus00714 00 03571
]
]
(20)
The variance factors are listed in (4) where for exampleVF119888(1205732) = 0388903571 = 10889 As distinct from conven-
tional VIF119888s for 120573
1and 120573
2only our VF
119888(1205730) here reflects
Variance Deflationwherein 1205730is estimated more precisely in
the initial 119862perp-nonorthogonal designA further note on orthogonality is germane Suppose
that the actual experiment is 119863(0) yet towards a thoroughdiagnosis the user evaluates the conventional VIF
119888(1205731) =
12250 = VIF119888(1205732) as ratios of variances of 119863(0) to 119863(06)
from Table 2 Unfortunately their meaning is obscured sincea design cannot at once be 119881perp and 119862
perp-orthogonalAs in Remark 2 we next alter the basic designX
0at (1) on
shifting columns of themeasurements to [0 0] asminima andscaling to have squared lengths equal to 119899 = 5 The resultingX0follows directly from (1) The new matrix U = X
1015840
0X0 andits inverse V are
U = [
[
5 42426 42426
42426 5 4
42426 4 5
]
]
V = [
[
1 minus04714 minus04714
minus04714 07778 minus02222
minus04714 minus02222 07778
]
]
(21)
giving conventional VIF119906s as [5 38889 38889] Against 119862perp-
orthogonality this gives the diagnostics
[VF119888(1205730) VF119888(1205731) VF119888(1205732)] = [08140 10889 10889]
(22)
demonstrating in comparison with (4) that VF119888s apart from
1205730 are invariant under translation and scaling a consequence
of Lemma 14We further seek to compare variances for the shifted
and scaled (X1X2) against 119881perp-orthogonality as Reference
However from U = X10158400X0at (21) we determine that 119883
1=
084853 = 1198832and 5[(119883
2
15) + (119883
2
25)] = 14400 Since
the latter exceeds unity we infer from Lemma 9(i) that
Advances in Decision Sciences 9
Table 2 Designs 119863(119909) 119909 isin [minus05 00 05 06 10 15] obtained by substitutingX01= [11988311 11988312 11988313 11988314 11988315]1015840 in [15 ∙X2] of order (5times3)
and corresponding variances Var (1205730)Var (120573
1)Var (120573
2)
Design 11988311
11988312
11988313
11988314
11988315
Var (1205730) Var (120573
1) Var (120573
2)
119863(minus05) 10 00 05 10 05 19333 37333 09333119863(00) 10 05 05 10 00 09375 17500 04375119863(05) 05 10 00 10 05 07436 14359 03590119863(06) 06091 05793 06298 11819 0000 07286 14286 03570119863(10) 00 05 05 10 10 07222 15556 03889119863(15) 00 10 05 05 10 09130 24348 06087
119881perp-orthogonality is incompatible with this configuration
of the data so that the VF119907s are undefined Equivalently
on evaluating 1206011
= 1206012
= 2290 deg and 1206011+ 1206012
=
4580 lt 90 deg Corollary 10 asserts that R119881is not positive
definite This appears to be anomalous until we recallthat 119881perp-orthogonality is invariant under rescaling but notrecentering the regressors
54 A Critique We reiterate apparent ambiguity ascribedin the literature to orthogonality A number of widely heldguiding precepts has evolved as enumerated in part in ouropening paragraph and in Section 3 As in Remark 3 theseclearly have been taken to apply verbatim forX
0andX1015840
0X0in
1198720 to be paraphrased in part as follows(P1) Ill-conditioning espouses inflated variances that is
VIFs necessarily equal or exceed unity(P2) 119862perp-orthogonal designs are ldquoidealrdquo in that VIF
119888s for
such designs are all unity see [11] for exampleWe next reassess these precepts as they play out under119881perp and 119862
perp orthogonality in connection with Table 2(C1) For models in 119872
0having X = 0 we reiterate that
the uncentered VIF119906s at expression (3) namely
[36111 38889 11667] overstate adverse effects onvariances of the 119881perp-nonorthogonal array since X1015840
0X0
cannot be diagonal Moreover these conventionalvalues fail to discern that the revised values for VF
119907s
at (5) namely [07704 08889 08889] reflect that[1205730 1205731 1205732] are estimated with greater efficiency in the
119881perp-nonorthogonal array119863(1)
(C2) For 119881perp-orthogonality the claim P1 thus is false by
counterexampleTheVFs at (5) areVarianceDeflationFactors for design 119863(1) where X1015840
1X2= 1 relative to
the 119881perp-orthogonal119863(0)(C3) Additionally the variances Var[120573
0(119909)] Var[120573
1(119909)]
Var[1205732(119909)] in Table 2 are all seen to decrease from
119863(0) [09375 17500 04375] rarr 119863(1) [0722215556 03889] despite the transition from the 119881
perp-orthogonal 119863(0) to the 119881
perp-nonorthogonal 119863(1)Similar trends are confirmed in other ill-conditioneddata sets from the literature
(C4) For 119862perp-orthogonal designs the claim P2 is false by
counterexample Such designs need not be ldquoidealrdquoas 1205730may be estimated more efficiently in a 119862
perp-nonorthogonal design as demonstrated by the VF
119888s
for 1205730at (4) and (22) Similar trends are confirmed
elsewhere as seen subsequently
(C5) VFs for 1205730are critical in prediction where prediction
variances necessarily depend on Var(1205730) especially
for predicting near the origin of the system of coor-dinates
(C6) Dissonance between 119881perp and 119862
perp is seen in Table 2where 119863(06) as 119862
perp-orthogonal with X10158401X2
= 06is the antithesis of 119881perp-orthogonality at 119863(0) whereX10158401X2= 0
(C7) In short these transparent examples serve to dispelthe decades-old mantra that ill-conditioning neces-sarily spawns inflated variances formodels in119872
0 and
they serve to illuminate the contributing structures
6 Orthogonal and Linked Arrays
A genuine 119881perp-orthogonal array was generated as eigenvec-
tors E = [E1E2 E
8] from a positive definite (8 times 8)
matrix (see Table 83 of [11 p377]) using PROC IML of theSAS programming package The columns [X
1X2X3] as the
second through fourth columns of Table 3 comprise the firstthree eigenvectors scaled to length 8 These apply for modelsin 119872119908and 119872
0to be analyzed In addition linked vectors
(Z1Z2Z3) were constructed as Z
1= 119888(09E
1+ 01E
2)
Z2
= 119888(09E2+ 01E
3) and Z
3= 119888(09E
3+ 01E
4) where
119888 = radic8082 Clearly these arrays as listed in Table 3 are notarchaic abstractions as both are amenable to experimentalimplementation
61 The Model 1198720 We consider in turn the orthogonal and
the linked series
611 Orthogonal Data Matrices X10158400X0and (X1015840
0X0)minus1 for the
orthogonal data under model1198720are listed in Table 4 where
variances occupy diagonals of (X10158400X0)minus1 The conventional
uncentered VIF119906s are [120296 102144 112256 105888]
Since X = 0 we find the angle between the constant andthe span of the regressor vectors to be 120579
0= 65748 deg as
in Theorem 4(ii) Moreover the angle between X1and the
span of [1119899X2X3] namely 120579
1= 81670 deg is not 90 deg
because of collinearity with the constant vector despite themutual orthogonality of [X
1X2X3] Observe here thatX1015840
0X0
10 Advances in Decision Sciences
Table 3 Basic 119881perp-orthogonal design X0 = [18 X1 X2 X3] and a linked design Z0 = [18 Z1 Z2 Z3] of order (8 times 4)
Orthogonal (X1X2X3) Linked (Z
1Z2Z3)
18
X1
X2
X3
18
Z1
Z2
Z3
1 minus1084470 0056899 0541778 1 minus1071550 0116381 06534861 minus1090010 minus0040490 0117026 1 minus1087820 minus0027320 01160861 0979050 0002564 2337454 1 0973345 0260677 21996871 minus1090230 0016839 0170711 1 minus1081700 0035588 00706941 0127352 2821941 minus0071670 1 0438204 2796767 minus00707101 1045409 minus0122670 minus1476870 1 1025468 minus0285010 minus16185601 1090803 minus0088970 0043354 1 1074306 minus0083640 01769281 1090739 minus0092300 0108637 1 1073875 minus0079740 0244567
Table 4 Matrices X10158400X0 and (X10158400X0)minus1 for the orthogonal data under model119872
0
X10158400X0
(X10158400X0)minus1
8 10686 25538 17704 01504 minus00201 minus00480 minus0033310686 8 0 0 minus00201 01277 00064 0004425538 0 8 0 minus00480 00064 01403 0010617704 0 0 8 minus00333 00044 00106 01324
already is 119881perp-orthogonal accordingly R119881= X10158400X0 and thus
the VF119907s are all unity
In view of dissonance between 119881perp and 119862
perp orthogonalitythe sums of squares and products for the mean-centered Xare
X1015840B119899X = [
[
78572 minus03411 minus02365
minus03411 71847 minus05652
minus02365 minus05652 76082
]
]
(23)
reflecting nonnegligible negative dependencies to distin-guish the 119881
perp-orthogonal X from the 119862perp-nonorthogonal
matrix Z = B119899X of deviations
To continue we suppose instead that the 119881perp-orthogonalmodel was to be recast as 119862
perp-orthogonal The Referencemodel and its inverse are listed in Table 5 Variance factorsas ratios of diagonal elements of (X1015840
0X0)minus1 in Table 4 to those
of Rminus1119862
in Table 5 are
[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]
= [10168 10032 10082 10070]
(24)
reflecting negligible differences in precision This parallelsresults reported in Table 2 comparing 119863(0) to 119863(06) asReference except for larger differences in Table 2 namely[128681225012250]
612 Linked Data For the Linked Data under model 1198720
the matrix Z10158400Z0 follows routinely from Table 3 its inverse(Z10158400Z0)minus1 is listed in Table 6 The conventional uncentered
VIF119906s are now [120320 103408 113760 105480] These
again fail to gage variance inflation since Z10158400Z0cannot be
diagonal From (10) the principal angle between the constantvector and the span of the regressor vectors is 120579
0= 65735
degrees
Remark 15 Observe that the choice for R119862may be posed as
seeking R119862
= U(119909) such that Rminus1119862(2 3) = 00 then solving
numerically for 119909 using Maple for example However thealgorithm in Definition 12(ii) affords a direct solution thatC119908should be diagonal stipulates its off-diagonal element as
X10158401X2minus 35 = 0 so that X1015840
1X2= 06 in R
119862at expression (20)
To illustrate Theorem 4(iii) we compute cos(120579) = (1 minus
110889)12 = 02857 and 120579 = 73398 deg as the angle between
the vectors [X1X2] of (1) when centered to their means For
slopes in 1198720 [9] shows that VIF
119888(120573119895) lt VIF
119906(120573119895) 1 le
119894 le 119896 This follows since their numerators are equal butdenominators are reciprocals of lengths of the centered anduncentered regressor vectors To illustrate numerators areequal for VIF
119906(1205733) = 03889(13) = 11667 and for
VIF119888(1205733) = 03889(128) = 10889 but denominators are
reciprocals of X10158402X2= 3 and sum
5
119895=1(1198832119895
minus 1198832)2= 28
(119894) 119881perp Reference Model To rectify this malapropism we seek
in Definition 8 a model R119881as Reference This is found on
setting all off-diagonal elements of Z10158400Z0to zero excluding
the first row and column Its inverse Rminus1119881
is listed in Table 6The VF
119907s are ratios of diagonal elements of (Z1015840
0Z0)minus1 on the
left to diagonal elements of Rminus1119881
on the right namely
[VF119907(1205730) VF119907(1205731) VF119907(1205732) VF119907(1205733)]
= [09697 09991 09936 09943]
(25)
Against conventional wisdom it is counter-intuitive that[1205730 1205731 1205732 1205733] are all estimated with greater precision in the
119881perp-nonorthogonal model Z1015840
0Z0than in the 119881
perp-orthogonalmodel R
119881of Definition 8 As in Section 54 this serves again
to refute the tenet that ill-conditioning espouses inflated
Advances in Decision Sciences 11
Table 5 Matrices R119862and Rminus1
119862for checking 119862
perp-orthogonality in the orthogonal data X0 under model1198720
R119862
Rminus1119862
8 10686 25538 17704 01479 minus00170 minus00444 minus0029110686 8 03411 02365 minus00170 01273 0 025538 03411 8 05652 minus00444 0 01392 017704 02365 05652 8 minus00291 0 0 01314
Table 6 Matrices (Z10158400Z0)minus1 and Rminus1
119881for checking 119881
perp-orthogonality for the linked data under model1198720
(Z10158400Z0)minus1 Rminus1
119881
01504 minus00202 minus00461 minus00283 01551 minus00261 minus00530 minus00344minus00202 01293 minus000797 00053 minus00261 01294 00089 00058minus00461 minus00079 01422 minus00054 minus00530 00089 01431 00117minus00283 00053 minus00054 01319 minus00344 00058 00117 01326
variances for models in1198720 that is that VFs necessarily equal
or exceed unity(119894119894) 119862
perp Reference Model Further consider variances werethe Linked Data to be centered As in Definition 12(ii)we seek a Reference model R
119862such that C
119908= (W minus
M) is diagonal The required R119862
and its inverse arelisted in Table 7 Since variances in the Linked Data are[015040 012926 014220 013185] from Table 6 and Refer-ence values appear as diagonal elements of Rminus1
119862on the right
in Table 7 their ratios give the centered VF119888s as
[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]
= [09920 10049 10047 10030]
(26)
We infer that changes in precision would be negligible ifinstead the Linked Data experiment was to be recast as a 119862perp-orthogonal experiment
As in Remark 15 the reference matrix R119862is constrained
by the first and second moments from X10158400X0and has free
parameters 1199091 1199092 1199093 for the (2 3) (2 4) (3 4) entries
The constraints for 119862perp-orthogonality are Rminus1
119862(2 3) =
Rminus1119862(2 4) = Rminus1
119862(3 4) = 0 Although this system of
equations can be solved with numerical software such asMaple the algorithm stated in Definition 12 easily yields thedirect solution given here
62 The Model 119872119908 For the Linked Data take 119884
119894= 12057311198851198941+
12057321198851198942
+ 12057331198851198943
+ 120598119894 1 le 119894 le 8 as Y = Z120573 + 120598
where the expected response at 1198851
= 1198852
= 1198853
= 0
is 119864(119884) = 00 and there is no occasion to translate theregressorsThe lower right (3 times 3) submatrix ofZ1015840
0Z0(4 times 4)
from 1198720is Z1015840Z its inverse is listed in Table 8 The ratios
of diagonal elements of (Z1015840Z)minus1 to reciprocals of diagonalelements of Z1015840Z are the conventional uncentered VIF
119906s
namely [10123 10247 10123] Equivalently scaling Z1015840Z tohave unit diagonals and inverting gives Cminus1 as in Table 8 itsdiagonal elements are the VIF
119906sThroughout Section 6 these
comprise the only correct interpretation of conventionalVIF119906s as genuine variance ratios
7 Case Study Body Fat Data
71 The Setting Body fat and morphogenic measurementswere reported for 119899 = 20 human subjects in Table D11 of[29 p717] Response and regressor variables are119884 amount ofbody fat X
1 triceps skinfold thickness X
2 thigh circumfer-
ence and X3 mid-arm circumference under119872
0 119884119894= 1205730+
12057311198831+12057321198832+12057331198833+ 120598119894 From the original data as reported
the condition number of X10158400X0is 1039931 and the VIF
119906s are
[2714800 4469419 630998 778703] On scaling columnsof X to equal column lengths X
119894 = 50 119894 = 1 2 3 the
resulting condition number of the revisedX10158400X0is 2 9696518
and the uncenteredVIF119906s are as before from invariance under
scalingAgainst complaints that regressors are remote from their
natural origins we center regressors to [0 0 0] as origin onsubtracting the minimal element from each column as inRemark 2 and scaling these to X
119894 = 50 119894 = 1 2 3
This is motivated on grounds that centered diagnostics apartfrom VIF
119888(1205730) are invariant under shifts and scalings from
Lemma 14 In addition under the new origin [0 0 0] 1205730
assumes prominence as the baseline response against whichchanges due to regressors are to be gaged The original dataare available on the publisherrsquos web page for [29]
Taking these shifted and scaled vectors as columnsof X = [X
1X2X3] in the revised X
0= [1
119899X]
values for X10158400X0
and its inverse are listed in Table 9The condition number of X1015840
0X0
is now 1136969 andits VIF
119906s are [6775617998742782174484] Additionally
from Theorem 4(ii) we recover the angle 1205790= 22592 deg to
quantify collinearity of regressors with the constant vectorIn addition the scaled and centered correlationmatrix for
X is
R = [
[
10 00847 08781
00847 10 01424
08781 01424 10
]
]
(27)
indicating strong linkage between mean-centered versions of(X1X3)Moreover the experimental design is neither119881perp nor
119862perp-orthogonal since neither the lower right (3 times 3) block of
X10158400X0nor the centered matrixC = (X1015840Xminus119899XX1015840) is diagonal
12 Advances in Decision Sciences
Table 7 Matrices R119862and Rminus1
119862for checking 119862
perp-orthogonality in the linked data under model1198720
R119862
Rminus1119862
8 13441 27337 17722 01516 minus00216 minus00484 minus0029113441 8 04593 02977 minus00216 01286 0 027337 04593 8 06056 minus00484 0 01415 017722 02977 06056 8 minus00291 0 0 01314
Table 8 Matrices (Z1015840Z)minus1 and Cminus1 for the linked data under model119872119908
(Z1015840Z)minus1 Cminus1
01265 minus00141 00015 10123 minus01125 00123minus00141 01281 minus00141 minus01125 10247 minus0112500015 minus00141 01265 00123 minus01125 10123
(119894) 119881perp Reference Model To compare this with a 119881
perp-or-thogonal design we turn again to Definition 8 Howeverevaluating the test criterion in Lemma 9(119894) shows that thisconfiguration is not commensurate with 119881
perp-orthogonalityso that the VF
119907s are undefined Comparisons with 119862
perp-orthogonal models are addressed next(119894119894) 119862perp Reference Model As in Definition 12(ii) the centering
matrix is
119899XX1015840 = [
[
188889 189402 187499
189402 189916 188007
187499 188007 186118
]
]
(28)
To check against 119862perp-orthogonality we obtain the matrix Wof Definition 12(ii) on replacing off-diagonal elements of thelower right (3 times 3) submatrix ofX1015840
0X0by corresponding off-
diagonal elements of 119899XX1015840 The result is the matrix R119862and
its inverse as given in Table 10 where diagonal elements ofRminus1119862
are the Reference variances The VF119888s found as ratios
of diagonal elements of (X10158400X0)minus1 relative to those of Rminus1
119862
are listed in Table 12 under G = [0 0 0] the indicator ofDefinition 12(i) for this case
72 Variance Factors and Linkage Traditional gages of ill-conditioning are patently absurd on occasion By conventionVIFs are ldquoall or nonerdquo wherein Reference models entailstrictly diagonal components In practice some regressorsare inextricably linked to get or even visualize orthogonalregressors may go beyond feasible experimental rangesrequire extrapolation beyond practical limits and challengecredulity Response functions so constrained are described in[30] as ldquopicket fencesrdquo In such cases taking C to be diagonalas its 119862perp Reference is moot at best an academic folly abettedin turn by default in standard software packages In shortconventional diagnostics here offer answers to irrelevantquestions Given pairs of regressors irrevocably linked weseek instead to assess effects on variances for other pairsthat could be unlinked by design These comments applyespecially to entries under G = [0 0 0] in Table 12
To proceed we define essential ill-conditioning as regres-sors inextricably linked necessarily to remain so and asnonessential if regressors could be unlinked by design As a
followup to Section 32 for X = 0 we infer that the constantvector is inextricably linked with regressors thus accountingfor essential ill-conditioning not removed by centeringcontrary to claims in [4] Indeed this is the essence ofDefinition 8
Fortunately these limitations may be remedied throughReference models adapted to this purpose This in turnexploits the special notation and trappings of Definition 12(i)as follows In addition toG = [0 0 0] we iterate for indicatorsG = [119892
12 11989213 11989223] taking values in [1 0 0] [0 1 0] [0 0 1]
[1 1 0] [1 0 1] [0 1 1] Values of VF119888s thus obtained are
listed in Table 12 To fix ideas the Reference R119862for G =
[1 0 0] that is for the constraint R119862(2 3) = X1015840
0X0(2 3) =
194533 is given in Table 11 together with its inverse Foundas ratios of diagonal elements of (X1015840
0X0)minus1 relative to those of
Rminus1119862
in Table 11 VF119888s are listed in Table 12 underG = [1 0 0]
In short R119862is ldquoas 119862
perp-orthogonal as it could berdquo given theconstraint Other VF
119888s in Table 12 proceed similarly For all
cases in Table 12 excluding G = [0 1 1] 1205730is estimated
with greater efficiency in the original model than any of thepostulated Reference models
As in Remark 15 the reference matrix R119862
for G =
[1 0 0] is constrained by X10158401X2
= 194533 to be retainedwhereas X1015840
1X3
= 1199091and X1015840
2X3
= 1199092are to be determined
from Rminus1119862(2 4) = Rminus1
119862(3 4) = 0 Although this system
can be solved numerically as noted the algorithm stated inDefinition 12 easily yields the direct solution given here
Recall 1198831 triceps skinfold thickness 119883
2 thigh cir-
cumference and 1198833 mid-arm circumference and from
(27) that (1198831 1198833) are strongly linked Invoking the cutoff
rule [24] for VIFs in excess of 40 it is seen for G =
[0 0 0] in Table 12 that VF119888(1205731) and VF
119888(1205733) are excessive
where [X1X2X3] are forced to be mutually uncorrelated
as Reference Remarkably similar values are reported forG isin [1 0 0] [ 0 0 1] [1 0 1] all gaged against intractableReference models wherein (119883
1 1198833) are postulated to be
uncorrelated however incrediblyOn the other hand VF
119888s for G isin [0 1 0] [1 1 0]
[0 1 1] in Table 12 are likewise comparable where (X1X3)
are allowed to remain linked at X10158401X3= 242362 Here the
VF119888s reflect negligible changes in efficiency of the estimates
Advances in Decision Sciences 13
Table 9 Matrices X10158400X0 and (X10158400X0)minus1 for the body fat data under model119872
0centered to minima and scaled to equal lengths
X10158400X0
(X10158400X0)minus1
20 194365 194893 192934 03388 minus01284 minus01482 minus00203194365 25 194533 242362 minus01284 07199 00299 minus06224194893 194533 25 196832 minus01482 00299 01711 minus00494192934 242362 196832 25 minus00203 minus06224 minus00494 06979
Table 10 Matrices R119862and Rminus1
119862for the body fat data adjusted to give off-diagonal elements the value 00 with code G = [0 0 0]
R119862
Rminus1119862
20 194365 194893 192934 05083 minus01590 minus01622 minus01510194365 25 189402 187499 minus01590 01636 0 0194893 189402 25 188007 minus01622 0 01664 0192934 187499 188007 25 minus01510 0 0 01565
in comparison with the originalX10158400X0 where [X
1X2X3] are
all linked In summary negligible changes in overall efficiencywould accrue on recasting the experiment so that (X
1X2)
and (X2X3) were pairwise uncorrelated
Table 12 conveys further useful information on notingthat VFs as ratios are multiplicative and divisible Forexample take the model codedG = [1 1 0] whereX1015840
1X2and
X10158401X3are retained at their initial values under X1015840
0X0 namely
194533 and 242362 from Table 9 with (X2X3) unlinked
Now taking G = [0 1 0] as reference we extract the VFson dividing elements in Table 12 at G = [0 1 0] by those ofG = [1 1 0] The result is
[VF (1205730) VF (120573
1) VF (120573
2) VF (120573
3)]
= [09196
0984810073
1005710282
1020810208
10208]
= [09338 10017 10072 10000]
(29)
8 Conclusions
Our goal is clarity in the actual workings of VIFs as ill-conditioning diagnostics thus to identify and rectifymiscon-ceptions regarding the use of VIFs for models X
0= [1119899X]
with intercept Would-be arbiters for ldquocorrectrdquo usage havedivided between VIF
119888s for mean-centered regressors and
VIF119906s for uncentered data We take issue with both Conven-
tional but clouded vision holds that (i) VIF119906s gage variance
inflation owing to nonorthogonal columns of X0as vectors
in R119899 (ii) VIFs ge 10 and (iii) models having orthogonalmean-centered regressors are ldquoidealrdquo in possessing VIF
119888s of
value unity Reprising Remark 3 these properties widely andcorrectly understood for models in119872
119908 appear to have been
expropriated without verification for models in1198720hence the
anomaliesAccordingly we have distinguished vector-space orthog-
onality (119881perp) of columns of X0 in contrast to orthogonality
(119862perp) of mean-centered columns of X A key feature is theconstruction of second-moment arrays as Reference models(R119881R119862) encoding orthogonalities of these types and their
Variance Factors as VF119907rsquos and VF
119888rsquos respectively Contrary
to convention we demonstrate analytically and through casestudies that (i1015840) VIF
119906s do not gage variance inflation (ii1015840)
VFs ge 10 need not hold whereas VF lt 10 representsVariance Deflation and (iii1015840) models having orthogonalcentered regressors are not necessarily ldquoidealrdquo In particularvariance deflation occurs when ill-conditioned data yieldsmaller variances than corresponding orthogonal surrogates
In short our studies call out to adjudicate the prolongeddispute regarding centered and uncentered diagnostics formodels in119872
0 Our summary conclusions follow
(S1) The choice of a model with or without intercept isa substantive matter in the context of each exper-imental paradigm That choice is fixed beforehandin a particular setting is structural and is beyondthe scope of the present study Snee and Marquardt[10] correctly assert that VIFs are fully creditedand remain undisputed in models without interceptthese quantities unambiguously retain the meaningsoriginally intended namely as ratios of variances togage effects of nonorthogonal regressors For modelswith intercept the choice is between centered anduncentered diagnostics which is to be weighed asfollows
(S2) ConventionalVIF119888s apply incompletely to slopes only
not necessarily grasping fully that a system is illconditioned Of particular concern is missing evi-dence on collinearity of regressors with the interceptOn the other hand if 119862
perp-orthogonality pertainsthen VF
119888s may supplant VIF
119888s as applicable to all of
[1205730 1205731 120573
119896] Moreover the latter may reveal that
VF119888(1205730) lt 10 as in cited examples and of special
import in predicting near the origin of the coordinatesystem
(S3) Our VF119907s capture correctly the concept apparently
intended by conventional VIF119906s Specifically if 119881perp-
orthogonality pertains then VF119907s as genuine ratios
of variances may supplant VIF119906s as now discredited
gages for variance inflation(S4) Returning to Section 32 we subscribe fully but for
different reasons to Belsleyrsquos and othersrsquo contention
14 Advances in Decision Sciences
Table 11 Matrices R119862and Rminus1
119862for the body fat data adjusted to give off-diagonal elements the value 00 except X10158401X2 code G = [1 0 0]
R119862
Rminus1119862
20 194365 194893 192934 04839 minus01465 minus01497 minus01510194365 25 194533 187499 minus01465 01648 minus00141 0194893 194533 25 188007 minus01497 minus00141 01676 0192934 187499 188007 25 minus01510 0 0 01565
Table 12 Summary VF119888s for the body fat data centered to minima and scaled to common lengths identified with varying G = [119892
12 11989213 11989223]
from Definition 12
Elements of G VF119888s
11989212
11989213
11989223
1205730
1205731
1205732
1205733
0 0 0 06665 43996 10282 44586
1 0 0 07002 43681 10208 44586
0 1 0 09196 10073 10282 10208
0 0 1 07201 43996 10073 43681
1 1 0 09848 10057 10208 10208
1 0 1 07595 43681 10003 43681
0 1 1 10248 10073 10073 10160
that uncentered VIF119906s are essential in assessing ill-
conditioned data In the geometry of ill-conditioningthese should be retained in the pantheon of regres-sion diagnostics not as now debunked gages ofvariance inflation but as more compelling angularmeasures of the collinearity of each column of X
0=
[1119899X1X2 X
119896] with the subspace spanned by
remaining columns(S5) Specifically the degree of collinearity of regressors
with the intercept is quantified by 1205790deg which
if small is known to ldquocorrupt the estimates of allparameters in the model whether or not the interceptis itself of interest and whether or not the data havebeen centeredrdquo [21 p90]This in turnwould appear toelevate 120579
0in importance as a critical ill-conditioning
diagnostic(S6) In contrast Theorem 4(iii) identifies conventional
VIF119888s with angles between a centered regressor and
the subspace spanned by remaining centered regres-sors For example
1205791= arccos([
[
1 minus1
VIFc ( ldquo1205731)]
]
12
) (30)
is the angle between the centered vector Z1= B119899X1
and the remaining centered vectors These lie in thelinear subspace of R119899 comprising the orthogonalcomplement to 1
119899isin R119899 and thus bear on the
geometry of ill-conditioning in this subspace(S7) Whereas conventional VIFs are ldquoall or nonerdquo to
be gaged against strictly diagonal components ourReference models enable unlinking what may beunlinked while leaving other pairs of regressorslinked This enables us to assess effects on variances
attributed to allowable partial dependencies amongsome but not other regressors
In conclusion the foregoing summary points to limita-tions in modern software packages Minitab v15 119878119875119878119878 v19and 119878119860119878 v92 are standard software packages which computecollinearity diagnostics returning with the default optionsthe centered VIFs VIF
119888(120573119895) 1 le 119895 le 119896 To compute
the uncentered VIF119906s one needs to define a variable for
the constant term and perform a linear regression using theoptions of fitting without an intercept On the other handcomputations for VF
119888s and VF
119907s as introduced here require
additional if straightforward programming for exampleMaple and PROC IML of the SAS System as utilized here
References
[1] D A Belsley ldquoDemeaning conditioning diagnostics throughcenteringrdquo The American Statistician vol 38 no 2 pp 73ndash771984
[2] E R Mansfield and B P Helms ldquoDetecting multicollinearityrdquoThe American Statistician vol 36 no 3 pp 158ndash160 1982
[3] S D Simon and J P Lesage ldquoThe impact of collinearityinvolving the intercept term on the numerical accuracy ofregressionrdquo Computer Science in Economics and Managementvol 1 no 2 pp 137ndash152 1988
[4] D W Marquardt and R D Snee ldquoRidge regression in practicerdquoThe American Statistician vol 29 no 1 pp 3ndash20 1975
[5] K N Berk ldquoTolerance and condition in regression computa-tionsrdquo Journal of the American Statistical Association vol 72 no360 part 1 pp 863ndash866 1977
[6] A E Beaton D Rubin and J Barone ldquoThe acceptability ofregression solutions another look at computational accuracyrdquoJournal of the American Statistical Association vol 71 no 353pp 158ndash168 1976
Advances in Decision Sciences 15
[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975
[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970
[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987
[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984
[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990
[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980
[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969
[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996
[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003
[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006
[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008
[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010
[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981
[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983
[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987
[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986
[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980
[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007
[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984
[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969
[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997
[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000
[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003
[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983
2 Advances in Decision Sciences
as ratios of variances In contrast a yet unresolved debatesurrounds the choice of conditioning diagnostics for mod-els in 119872
0 namely between uncentered regressors giving
VIF119906s for [120573
0 1205731 120573
119896] and regressors X
119894centered to their
means giving VIF119888s for [120573
1 120573
119896] Specifically VIF
119906(120573119895) =
119906119895119895119907119895119895 119895 = 0 1 119896 on taking U = X1015840
0X0and V =
(X10158400X0)minus1 In contrast letting R = [119877
119894119895] be the correlation
matrix for = [1205731 120573
119896]1015840
and Rminus1 = [119877119894119895] its inverse then
centered versions are VIF119888(120573119895) = 119877119895119895 1 le 119895 le 119896 for slopes
onlyIt is seen here that (i) these differ profoundly in regard
to their respective concepts of orthogonality (ii) objectivesand meanings differ accordingly (iii) sharp divisions trace tomuddling these concepts and (iv) this distinction assumesa pivotal role here Over time a large body of widely heldbeliefs conjectures intrinsic propositions and conventionalwisdom has accumulated much from flawed logic some tobe dispelled here The key to our studies is that VIFs tobe meaningful must compare actual variances to those ofan ldquoidealrdquo second-moment matrix as reference the latter toembody the conjectured type of orthogonality This differsbetween centered and uncentered diagnostics and for bothtypes requires the reference matrix to be feasible An outlinefollows
Our undertaking consists essentially of four partsThe first is a literature survey of some essentials of ill-conditioning to include the divide between centered anduncentered diagnostics and conventional VIFs The struc-ture of orthogonality is examined next Anomalies inusage meaning and interpretation of conventional VIFs areexposed analytically and through elementary and transparentcase studies Long-standing but ostensible foundations inturn are reassessed and corrected through the constructionof ldquoReference modelsrdquo These are moment arrays constrainedto encode orthogonalities of the types considered Neitherarray returns the conventional VIF
119906s nor VIF
119888s Direct
rules for finding the amended Reference models are givenpreempting the need for constrained numerical algorithmsFinally studies of ill-conditioned data from the literature arereexamined in light of these findings
2 Preliminaries
21 Notation Designate byR119901 the Euclidean 119901-space Matri-ces and vectors are set in bold type the transpose and inverseof A are A1015840 and Aminus1 and A(119894119895) refers on occasion to the(119894 119895) element of A Special arrays are the identity I
119901 the unit
vector 1119901= [1 1 1]
1015840isin R119901 the diagonal matrix D
119886=
Diag(1198861 119886
119901) and B
119899= (I119899minus 119899minus1111989911015840119899) as idempotent of
rank 119899 minus 1 The Frobenius norm of x isin R119901 is x = (x1015840x)12ForX of order (119899times119901) and rank 119901 le 119899 a generalized inverse isdesignated as Xdagger its ordered singular values as 120590(X) = 120585
1ge
1205852
ge sdot sdot sdot ge 120585119901
gt 0 and by S119901(X) sub R119899 the subspace of
R119899 spanned by the columns of X By accepted conventionits condition number is 119888
2(X) = 120585
1120585119901 specifically 119888
2(X) =
[1198881(X1015840X)]
12 For our model Y = X120573 + 120598 with dispersion
119881(120598)=1205902I119899 we take 120590
2= 10 unless stated otherwise since
variance ratios are scale invariant
22 Case Study 1 A First Look That anomalies pervadeconventional VIF
119906s and VIF
119888s may be seen as follows Given
1198720 119884119894= 1205730+12057311198831+12057321198832+120598119894 the designX
0= [15X1X2]
of order (5 times 3)U=X10158400X0 and its inverse V=(X1015840
0X0)minus1 as in
X10158400= [
[
1 1 1 1 1
0 05 05 1 1
minus1 1 1 0 0
]
]
U = [
[
5 3 1
3 25 1
1 1 3
]
]
V = [
[
072222 minus088889 005556
minus088889 155556 minus022222
005556 minus022222 038889
]
]
(1)
Conventional centered and uncentered VIFs are
[VIF119888(1205731) VIF
119888(1205732)] = [10889 10889] (2)
[VIF119906(1205730)VIF
119906(1205731)VIF
119906(1205732)]= [36111 38889 11667]
(3)
respectively the former for slopes only and the latter takingreciprocals of the diagonals of X1015840
0X0as reference
A Critique The following points are basic Note first thatmodel (1) is nonorthogonal in both the centered and uncen-tered regressors
Remark 1 The VIF119906s are not ratios of variances and thus fail
to gage relative increases in variances owing to nonorthog-onal columns of X
0 This follows since the first row and
column of the second-moment matrix X10158400X0= U are fixed
and nonzero by design so that taking X10158400X0to be diagonal as
reference cannot be feasible
Remark 1 runs contrary to assertions throughout the liter-ature In consequence for models in 119872
0the mainstay VIF
119906s
in recent vogue are largely devoid of meaning Subsequentlythese are identified instead with angles quantifying degrees ofmulticollinearity among the regressors
On the other hand feasible Reference models for allparameters as identified later for centered and uncentereddata in Definition 13 Section 42 give
[VF119888(1205730)VF119888(1205731)VF119888(1205732)]= [09913 10889 10889]
(4)
[VF119907(1205730)VF119907(1205731)VF119907(1205732)]= [07704 08889 08889]
(5)
in lieu of conventional VIF119888s and VIF
119906s respectively The
former comprise corrected VIF119888s extended to include the
intercept Both sets in fact are genuine variance inflationfactors as ratios of variances in themodel (1) relative to thosein Reference models feasible for centered and for uncenteredregressors respectively
This example flagrantly contravenes conventional wis-dom (i) variances for slopes are inflated in (4) but for the
Advances in Decision Sciences 3
intercept deflated in comparison with the feasible centeredreference Specifically 120573
0is estimated here with greater
efficiency VF119888(1205730) = 09913 in the initial design (1) despite
nonorthogonality of its centered regressors (ii) Variances areuniformly smaller in the model (1) than in its feasible uncen-tered reference from (5) thus exhibiting Variance Deflationdespite nonorthogonality of the uncentered regressors Afull explication of the anatomy of this study appears laterIn support we next examine technical details needed insubsequent developments
23 Types of Orthogonality The ongoing divergence betweencentered and uncentered diagnostics traces in part to mean-ing ascribed to orthogonality Specifically the orthogonalityof columns of X in 119872
119908refers unambiguously to the vector-
space concept X119894perp X119895 that is X1015840
119894X119895
= 0 as does thenotion of collinearity of regressors with the constant vectorin 1198720 We refer to this as 119881-orthogonality in short 119881perp In
contrast nonorthogonality in 1198720often refers instead to the
statistical concept of correlation among columns of X whenscaled and centered to their means We refer to its negationas 119862-orthogonality or 119862
perp Distinguishing between thesenotions is fundamental as confusion otherwise is evident Forexample it is asserted in [11 p125] that ldquothe simple correlationcoefficient 119903
119894119895does measure linear dependency between 119909
119894
and 119909119895in the datardquo
24 The Models 1198720 Consider 119872
0 Y = X
0120579 + 120598 X
0=
[1119899X] 1205791015840 = [120573
01205731015840] together with
X10158400X0= [
119899 119899X1015840
119899X X1015840X] (X1015840
0X0)minus1
= [11988800
c01
c101584001
G ] (6)
where 119899X1015840 = 11015840119899X 11988800
= [119899 minus 1198992X1015840(X1015840X)
minus1X]minus1 C =
(X1015840Xminus119899XX1015840) =X1015840B119899X andG = Cminus1 from block-partitioned
inverses In later sections we often will denote the mean-centeringmatrix 119899XX1015840 byM In particular the centered formX rarr Z = B
119899X arises exclusively in models with intercept
with or without reparametrizing (1205730120573) rarr (120572120573) in the
mean-centered regressors where 120572 = 1205730+12057311198831+ sdot sdot sdot + 120573
119896119883119896
Scaling Z to unit column lengths gives Z1015840Z in correlationform with unit diagonals
3 Historical Perspective
Our objective is an overdue revision of the tenets of varianceinflation in regression To provide context we next surveyextenuating issues from the literature Direct quotations areintended not to subvert stances taken by the cited authorsModels in 119872
0are at issue since as noted in [10] centered
diagnostics have no place in119872119908
31 Background Aspects of ill-conditioning span a consid-erable literature over many decades Regarding Y = X120573 +120598 scaling columns of X to equal lengths approximatelyminimizes the condition number 119888
2(X) [12 p120] based on
[13] Nonetheless 1198882(X) is cast in [9] as a blunt instrument for
ill-conditioning prompting the need for VIFs and other localconstructs Stewart [9] credits VIFs in concept to Daniel andin name to Marquardt
Ill-conditioning points beyond OLS in view of difficultiescited earlier Remedies proffered in [14 15] include trans-forming variables adding new data and deleting variable(s)after checking critical features of the reduced model Otherintended palliatives include Ridge and Partial Least Squaresas compared in [16] Principal Components regression Sur-rogate models as in [17] All are intended to return reducedstandard errors at the expense of bias Moreover Surrogatesolutions more closely resemble those from an orthogonalsystem than Ridge [18] Together the foregoing and otheroptions comprise a substantial literature as collateral to butapart from the present study
32 To Center Advocacy for centering includes the follow-ing
(i) VIFs often are defined categorically as the diagonalsof the inverse of the correlation matrix of scaled andcentered regressors see [4 9 11 19] for exampleThese are VIF
119888s widely adopted without justification
as default to the exclusion of VIF119906s
(ii) It is asserted [4] that ldquocentering removes the nones-sential ill-conditioning thus reducing the varianceinflation in the coefficient estimatesrdquo
(iii) Centering is advocated when predictor variables arefar removed from origins on the basic data scales [1011]
33 Not to Center Advocacy for uncentered diagnosticsincludes the following caveats from proponents of centering
(i) Uncentered data should be examined only if an esti-mate of the intercept is of interest [9 10 20]
(ii) ldquoIf the domain of prediction includes the full rangefrom the natural origin through the range of data thecollinearity diagnostics should not bemean centeredrdquo[10 p84]
Other issues against centering derive in part from numer-ical analysis and work by Belsley
(i) Belsley [1] identifies 1198882(X) for a system Y = X120573+120598 as
ldquothe potential relative change in the LS solution b thatcan result from a small relative change in the datardquo
(ii) These require structurally interpretable variables asldquoones whose numerical values and (relative) differ-ences derive meaning and interpretability fromknowledge of the structure of the underlying lsquorealityrsquobeing modeledrdquo [1 p75]
(iii) ldquoThere is no such thing as lsquononessentialrsquo ill-condi-tioningrdquo and ldquomean-centering can remove from thedata the information needed to assess conditioningcorrectlyrdquo [1 p74]
(iv) ldquoCollinearity with the intercept can quite generallycorrupt the estimates of all parameters in the model
4 Advances in Decision Sciences
whether or not the intercept is itself of interest andwhether or not the data have been centeredrdquo [21 p90]
(v) An example [22 p121] gives severely ill-conditioneddata perfectly conditioned in centered form ldquocenter-ing alters neitherrdquo inflated variances nor extremelysensitive parameter estimates in the basic data more-over ldquodiagnosing the conditioning of the centereddata (which are perfectly conditioned) would com-pletely overlook this situation whereas diagnosticsbased on the basic data would notrdquo
(vi) To continue from [22] ill-conditioning persists inthe propagation of disturbances in that ldquoa 1 percentrelative change in theX
119894rsquos results in over a 40 relative
change in the estimatesrdquo despite perfect conditioningin centered form and ldquoknowledge of the effect ofsmall relative changes in the centered data is notmeaningful for assessing the sensitivity of the basic LSproblemrdquo since relative changes and their meaningsin centered anduncentered data oftendiffermarkedly
(vii) Regarding choice of origin ldquo(1) the investigator mustbe able to pick an origin against which small relativechanges can be appropriately assessed and (2) it is thedata measured relative to this origin that are relevantto diagnosing the conditioning of the LS problemrdquo[22 p126]
Other desiderata pertain
(i) ldquoBecause rewriting the model (in standardized vari-ables) does not affect any of the implicit estimates ithas no effect on the amount of information containedin the datardquo [23 p76]
(ii) Consequences of ill-advised diagnostics can besevere Degraded numerical accuracy traces to nearcollinearity of regressors with the constant vectorIn short centering fails to prevent a loss in numer-ical accuracy centered diagnostics are unable todiscern these potential accuracy problems whereasuncentered diagnostics are seen to work well Twowidely used statistical packages SAS and SPSS-Xfail to detect this type of ill-conditioning throughuse of centered diagnostics and thus return highlyinaccurate coefficient estimates For further detailssee [3]
On balance formodels in1198720the jury is out regarding the
use of centered or uncentered diagnostics to include VIFsEven Belsley [1] (and elsewhere) concedes circumstanceswhere centering does achieve structurally interpretable mod-els Of note is that the foregoing listed citations to Belsleyapply strictly to condition numbers 119888
2(X) other purveyors of
ill-conditioning specifically VIFs are not treated explicitly
34 A Synthesis It bears notice that (i) the origin how-ever remote from the cluster of regressors is essential forprediction and (ii) the prediction variance is invariant toparametrizing in centered or uncentered forms Additionalremarks are codified next for subsequent referral
Remark 2 Typically 119884 represents response to input vari-ables 119883
1 1198832 119883
119896 In a controlled experiment levels
are determined beforehand by subject-matter considerationsextraneous to the experiment to include minimal valuesHowever remote the origin on the basic data scales it seemsinformative in such circumstances to identify the originwith these minima In such cases the intercept is often ofsingular interest since 120573
0is then the standard against which
changes in 119884 are to be gaged as regressors vary We adopt thisconvention in subsequent studies from the archival literature
Remark 3 In summary the divergence in views whetherto center or not appears to be that critical aspects of ill-conditioning known and widely accepted for models in119872
119908
have been expropriated over decades mindlessly and withoutverification to apply point-by-point for models in119872
0
4 The Structure of Orthogonality
This section develops the foundations for Reference modelscapturing orthogonalities of types 119881
perp and 119862perp Essential
collateral results are given in support as well
41 Collinearity Indices Stewart [9] reexamines ill-condi-tioning from the perspective of numerical analysis Detailsfollow where X is a generic matrix of regressors havingcolumns x
119895 1 le 119895 le 119901 and Xdagger = (X1015840X)
minus1X1015840 is thegeneralized inverse of note having xdagger
119895 1 le 119895 le 119901 as
its typical rows Each corresponding collinearity index 120581 isdefined in [9 p72] as
120581119895=10038171003817100381710038171003817x119895
10038171003817100381710038171003817sdot10038171003817100381710038171003817xdagger119895
10038171003817100381710038171003817 1 le 119895 le 119901 (7)
constructed so as to be scale invariant Observe that xdagger1198952 is
found along the principal diagonal of [(Xdagger)(Xdagger)1015840] = (X1015840X)minus1
WhenX(119899times119896) inX0= [1119899X] is centered and scaled Section
3 of [9] shows that the centered collinearity indices 120581119888satisfy
1205812
119888119895= VIF
119888(120573119895) 1 le 119895 le 119896 In 119872
0with parameters (120573
0
120573) values corresponding to xdagger1198952 from Xdagger
0= (X10158400X0)minus1X10158400
lie along the principal diagonal of (X10158400X0)minus1 the uncentered
collinearity indices 120581119906now satisfy 120581
2
119906119895= x1198952sdot xdagger1198952
119895 =
0 1 119896 In particular since x0
= 1119899 we have 120581
2
1199060=
119899xdagger02 Moreover in119872
0it follows that the uncentered VIF
119906s
are squares of the collinearity indices that is VIF119906(120573119895) =
1205812
119906119895 119895 = 0 1 119896 Note the asymmetry that VIF
119888s exclude
the intercept in contrast to the inclusive VIF119906s That the
label Variance Inflation Factors for the latter is a misnomer iscovered in Remark 1 Nonetheless we continue the familiarnotation VIF
119906(120573119895) = 1205812
119906119895 119895 = 0 1 119896
Transcending the essential developments of [9] are con-nections between collinearity indices and angles between
Advances in Decision Sciences 5
subspaces To these ends choose a typical x119895in X0 and
rearrange X0as [x119895X[119895]] We next seek elements of
Q1015840119895(X10158400X0)minus1
Q119895= [
x1015840119895x119895
x1015840119895X[119895]
X1015840[119895]x119895X1015840[119895]X[119895]
]
minus1
(8)
as reordered by the permutation matrix Q119895 From the clock-
wise rule the (1 1) element of the inverse is
[x1015840119895(I119899minus P119895) x119895]minus1
= [x1015840119895x119895minus x1015840119895P119895x119895]minus1
=10038171003817100381710038171003817xdagger119895
10038171003817100381710038171003817
2 (9)
in succession for each 119895 = 0 1 119896 where P119895
=
X[119895][X1015840[119895]X[119895]]minus1X1015840[119895]
is the projection operator onto the sub-space S
119901(X[119895]) sub R119899 spanned by the columns of X
[119895] These
in turn enable us to connect 1205812119906119895 119895 = 0 1 119896 and similarly
for centered values 1205812119888119895 119895 = 1 119896 to the geometry of ill-
conditioning as follows
Theorem 4 For models in 1198720let VIF
119906(120573119895) = 120581
2
119906119895 119895 = 0
1 119896 be conventional uncentered VIF119906s in terms of Stew-
artrsquos [9] uncentered collinearity indices These in turn quantifythe extent of collinearities between subspaces through angles (in119889119890119892 as follows
(i) Angles between (x119895S119901(X[119895])) are given by 120579
119895=
arccos[(1minus11205812uj)12
] in succession for 119895 = 0 1 119896
(ii) In particular 1205790
= arccos[(1 minus 11205812u0)12
] quantifiesthe degree of collinearity between the regressors and theconstant vector
(iii) Similarly let B119899X = Z = [z
1 z
119896] be regressors
centered to their means rearrange as [z119895Z[119895]] and let
VIF119888(120573119895) = 120581
2
119888119895 119895 = 1 119896 be centered VIF
119888s in
terms of Stewartrsquos centered collinearity indices Thenangles (in deg) between (z
119895S119901(Z[119895])) are given by
120579119895= arccos[(1 minus 11205812cj)
12] 1 le j le k
Proof From the geometry of the right triangle formed by(x119895P119895x119895) the squared lengths satisfy x
1198952
= P119895x1198952+
x119895minus P119895x1198952 where x
119895minus P119895x1198952
= RS119895is the residual sum
of squares from the projection It follows that the principalangle between (x
119895P119895x119895) is given by
cos (120579119895)=
x1015840119895P119895x119895
10038171003817100381710038171003817x119895
10038171003817100381710038171003817sdot10038171003817100381710038171003817P119895x119895
10038171003817100381710038171003817
=
10038171003817100381710038171003817P119895x119895
1003817100381710038171003817100381710038171003817100381710038171003817x119895
10038171003817100381710038171003817
=radic1 minusRS119895
10038171003817100381710038171003817x119895
10038171003817100381710038171003817
2=radic1 minus
1
1205812119906119895
(10)
for 119895 = 0 1 119896 to give conclusion (i) Conclusion(ii) follows on specializing (x
0P0x0) with x
0= 1119899and
P0= X(X1015840X)
minus1X1015840 Conclusion (iii) follows similarly from thegeometry of the right triangle formed by (z
119895P119895z119895) where P
119895
now is the projection operator onto the subspace S119901(Z[119895]) sub
R119899 spanned by the columns of Z[119895] to complete our proof
Remark 5 Rules of thumb in common use for problematicVIFs include those exceeding 10 or even 4 see [11 24] forexample In angular measure these correspond respectivelyto 120579119895lt 18435 deg and 120579
119895lt 300 deg
42 Reference Models We seek as Reference feasible modelsencoding orthogonalities of types 119881
perp and 119862perp The keys
are as follows (i) to retain essentials of the experimentalstructure and (ii) to alter what may be changed to achieveorthogonality For a model in119872
119908with moment matrix X1015840X
our opening paragraph prescribes as reference the modelR119881
= D = Diag(11988911 119889
119901119901) as diagonal elements of
X1015840X for assessing 119881perp-orthogonality Moreover on scaling
columns of X to equal lengths R119881is perfectly conditioned
with 1198881(R119881) = 10 In addition every model in 119872
119908clearly
conforms with its Reference in the sense that R119881is positive
definite as distinct from models in1198720to follow
Consider again models in 1198720as in (6) let C = (X1015840X minus
M) withM = 119899XX1015840 as the mean-centering matrix and againletD = Diag(119889
11 119889
119896119896) comprise the diagonal elements of
X1015840X
(i) 119881perpReference ModelThe uncentered VIF119906s in119872
0 defined
as ratios of diagonal elements of (X10158400X0)minus1 to reciprocals of
diagonal elements of X10158400X0 appear to have seen exclusive
usage apparently in keepingwithRemark 3However the fol-lowing disclaimermust be registered as the formal equivalentof Remark 1
Theorem 6 Statements that conventional VIF119906s quantify var-
iance inflation owing to nondiagonalX10158400X0are false for models
in1198720having X = 0
Proof Since the Reference variances are reciprocals of diag-onal elements of X1015840
0X0 this usage is predicated on the false
postulate that X10158400X0can be diagonal for X = 0 Specifically
[1119899X] are linearly independent that is 11015840
119899X = 0 if and only
if X has been mean centered beforehand
To the contrary Gunst [25] purports to show thatVIF119906(1205730) registers genuine variance inflation namely the
price to be paid in variance for designing an experimenthaving X = 0 as opposed to X = 0 Since variances forintercepts are Var(sdot | X = 0) = 120590
2119899 and Var(sdot | X = 0) =
120590211988800
ge 1205902119899 from (6) their ratio is shown in [25] to be
1205812
1199060= 11989911988800
ge 1 in the parlance of Section 23 We concede thisto be a ratio of variances but to the contrary not a VIF sincethe parameters differ In particular Var(120573
0| X = 0) = 120590
211988800
whereas Var( | X = 0) = 1205902119899 with 120572 = 120573
0+ 12057311198831+
sdot sdot sdot + 120573119896119883119896in centered regressors Nonetheless we still find
the conventional VIF119906(1205730) to be useful for purposes to follow
Remark 7 Section 3 highlights continuing concern in regardto collinearity of regressors with the constant vectorTheorem 4(ii) and expression (10) support the use of 120579
0=
arccos[(1 minus 11205812u0)12
] as an informative gage on the extent of
6 Advances in Decision Sciences
this occurrence Specifically the smaller the angle the greaterthe extent of such collinearity
Instead of conventional VIF119906s given the foregoing dis-
claimer we have the following amended version as Referencefor uncentered diagnostics altering whatmay be changed butleaving X = 0 intact
Definition 8 Given a model in 1198720with second-moment
matrixX10158400X0 the amendedReferencemodel for assessing119881perp-
orthogonality is R119881
= [ 119899 119899X1015840
119899X D] with D = Diag(119889
11 119889
119896119896)
as diagonal elements of X1015840X provided that R119881is positive
definite We identify a model to be 119881perp-orthogonal when
X10158400X0= R119881
As anticipated a prospective R119881
fails to conform toexperimental data if not positive definite These and furtherprospects are covered in the following where 120601
119894designates
the angle between (1119899X119894) 119894 = 1 119896
Lemma 9 Take R119881
as a prospective Reference for 119881perp-
orthogonality
(i) In order that R119881maybe positive definite it is necessary
that 119899X1015840Dminus1X lt 1 that is that 119899sum119896
119894=1(1198832
119894119889119894119894) lt 1
(ii) Equivalently it is necessary thatsum119896119894=1
cos2(120601119894) lt 1with
120601119894as the angle between (1
119899Xi) 119894 = 1 119896
(iii) The Reference variance for 1205730is Var(120573
0| X1015840X = D) =
[119899(1 minus 119899X1015840Dminus1X)]minus1
(iv) The Reference variances for slopes are given by
Var (120573119894| X1015840X = D) =
1
119889119894119894
[1 +1205741198832
119894
119889119894119894
] 1 le 119894 le 119896 (11)
where 120574 = 119899(1 minus 119899X1015840Dminus1X)
Proof The clockwise rule for determinants gives |R119881| =
|D|[119899 minus 1198992X1015840Dminus1X] Conclusion (i) follows since |D| gt 0
The computation
cos (120601119894) =
11015840119899X119894
1003817100381710038171003817111989910038171003817100381710038171003817100381710038171003817X119894
1003817100381710038171003817=
11015840119899X119894
radic119899119889119894119894
= radic1198991198832
119894
119889119894119894
(12)
in parallel with (10) gives conclusion (ii) Using the clockwiserule for block-partitioned inverses the (1 1) element of Rminus1
119881
is given by conclusion (iii) Similarly the lower right block ofRminus1119881 of order (119896 times 119896) is the inverse of H = D minus 119899XX1015840 On
identifying a = b = X 120572 = 119899 and 119886lowast
119894= 119886119894119889119894119894 in Theorem
833 of [26] we have that Hminus1 = Dminus1 + 120574Dminus1XX1015840Dminus1Conclusion (iv) follows on extracting its diagonal elementsto complete our proof
Corollary 10 For the case 119896 = 2 in order that R119881maybe
positive definite it is necessary that 1206011+ 1206012gt 90 deg
Proof Beginning with Lemma 9(ii) compute
cos (1206011+ 1206012)
= cos1206011cos1206012minus sin120601
1sin1206012
= cos1206011cos1206012minus[1minus(cos2120601
1+cos2120601
2)+cos2120601
1cos21206012]12
(13)
which is lt0 when 1 minus (cos21206011+ cos2120601
2) gt 0
Moreover the matrix R119881itself is intrinsically ill conditioned
owing to X = 0 its condition number depending on X Toquantify this dependence we have the following wherecolumns of X have been standardized to common lengths|X119894| = 119888 1 le 119894 le 119896
Lemma 11 Let R119881= [ 119899 T1015840
T 119888I119896
] as in Definition 8 with T = 119899XD = 119888I
119896 and 119888 gt 0
(i) The ordered eigenvalues of R119881are [120582
1 119888 119888 120582
119901]
where 119901 = 119896 + 1 and (1205821 120582119901) are the roots of 120582 isin
[(119899 + 119888) plusmn radic(119899 minus 119888)2+ 41205912]2 and 120591
2= T1015840T
(ii) The roots are positive andR119881is positive definite if and
only if (119899119888 minus 1205912) gt 0
(iii) If R119881
is positive definite its condition number is1198881(R119881) = 1205821120582119901and is increasing in 120591
2
Proof Eigenvalues are roots of the determinantal equation
Det [(119899 minus 120582) T1015840T (119888 minus 120582) I
119896
]
= 0 lArrrArr (119888 minus 120582)119896minus1
[(119899 minus 120582) (119888 minus 120582) minus T1015840T] = 0
(14)
from the clockwise rule giving (119896 minus 1) values 120582 = 119888 and tworoots of the quadratic equation 120582
2minus (119899 + 119888)120582 + 119899119888 minus 120591
2= 0
to give conclusion (i) Conclusion (ii) holds since the productof roots of the quadratic equation is (119899119888 minus 120591
2) and the greater
root is positive Conclusion (iii) follows directly to completeour proof
(119894119894) 119862perp Reference Model As noted the notion of 119862
perp-or-thogonality applies exclusively for models in 119872
0 Accord-
ingly as Reference we seek to alter X1015840X so that the matrixcomprising sums of squares and sums of products of devia-tions from means thus altered is diagonal To achieve thiscanon of 119862
perp-orthogonality and to anticipate notation forsubsequent use we have the following
Definition 12 (i) For a model in 1198720with second-moment
matrix X10158400X0and inverse as in (6) let C
119908= (W minus M) and
identify an indicator vector G = [11989212 11989213 119892
1119896 11989223
1198922119896 119892
119896minus1119896] of order ((119896 minus 1) times (119896 minus 2)) in lexicographic
order where 119892119894119895= 0 119894 = 119895 if the (119894 119895) element of C
119908is zero
and 119892119894119895= 1 119894 = 119895 if the (119894 119895) element ofC
119908is (X1015840119894X119895minus119899119883119894119883119895)
that is unchanged from C = (X1015840X minus 119899XX1015840) = Z1015840Z withZ = B
119899X as the regressors centered to their means
Advances in Decision Sciences 7
(ii) In particular the Reference model in1198720for assessing
119862perp-orthogonality is R
119862= [ 119899 119899X
1015840
119899X W] such that C
119908= (W minus
M) and its inverse G from (6) are diagonal that is takingW = [119908
119894119895] such that 119908
119894119894= 119889119894119894 1 le 119894 le 119896 and 119908
119894119895=
119898119894119895 for all 119894 = 119895 or equivalently G = [0 0 0] In this
case we identify the model to be 119862perp-orthogonal
Recall that conventional VIF119888s for [120573
1 1205732 120573
119896] are
ratios of diagonal elements of (Z1015840Z)minus1 to reciprocals ofthe diagonal elements of the centered Z1015840Z Apparently thisrests on the logic that 119862perp-orthogonality in 119872
0implies that
Z1015840Z is diagonal However the converse fails and instead isembodied in R
119862isin 1198720 In consequence conventional VIF
119888s
are deficient in applying only to slopes whereas the VF119888s
resulting from Definition 12(ii) apply informatively for all of[1205730 1205731 1205732 120573
119896]
Definition 13 Designate VF119907(sdot) and VF
119888(sdot) as Variance Fac-
tors resulting from Reference models of Definition 8 andDefinition 12(ii) respectively On occasion these representVariance Deflation in addition to VIFs
Essential invariance properties pertain ConventionalVIF119888s are scale invariant see Section 3 of [9]We see next that
they are translation invariant as well To these ends we shiftthe columns of X=[X
1X2 X
119896] to a new origin through
X rarr X minus L = H where L = [11988811119899 11988821119899 119888
1198961119899] The
resulting model is
Y = (12057301119899+ L120573) +H120573 + 120598
lArrrArr Y = (1205730+ 120595) 1
119899+H120573 + 120598
(15)
thus preserving slopes where120595 = c1015840120573=(11988811205731+11988821205732+sdot sdot sdot+119888
119896120573119896)
Corresponding to X0= [1119899X] is U
0= [1119899H] and to X1015840
0X0
and its inverse are
U10158400U0= [
119899 11015840119899H
H10158401119899
H1015840H] (U10158400U0)minus1
= [11988700
b01
b101584001
G ] (16)
in the form of (6)This pertains to subsequent developmentsand basic invariance results emerge as follows
Lemma 14 Consider Y = 12057301119899+ X120573 + 120598 together with the
shifted version Y = (1205730+ 120595)1
119899+ H120573 + 120576 both in 119872
0 Then
the matrices G appearing in (6) and (16) are identical
Proof Rules for block-partitioned inverses again assert thatG of (16) is the inverse of
(H1015840H minus 119899minus1H10158401
11989911015840119899H) = H1015840 (I
119899minus 119899minus1111989911015840119899)H
= H1015840B119899H = X1015840B
119899X
(17)
since B119899L = 0 to complete our proof
These facts in turn support subsequent claims that cen-tered VIF
119888s are translation and scale invariant for slopes
1015840
=
[1205731 1205732 120573
119896] apart from the intercept 120573
0
43 A Critique Again we distinguish VIF119888s and VIF
119906s from
centered and uncentered regressorsThe following commentsapply
(C1) A design is either 119881perp or 119862perp-orthogonal respectivelyaccording as the lower right (119896times119896) blockX1015840X ofX1015840
0X0
orC = (X1015840Xminus119899XX1015840) from expression (6) is diagonalOrthogonalities of type 119881perp and 119862
perp are exclusive andhence work at crossed purposes
(C2) In particular 119862perp-orthogonality holds if and only ifthe columns of X are 119881
perp-nonorthogonal If X is 119862perp-orthogonal then for 119894 = 119895 C
119908(119894119895) = X1015840
119894X119895minus 119899119883119894119883119895=
0 and X1015840119894X119895
= 0 Conversely if X indeed is 119881perp-
orthogonal so that X1015840X = D= Diag(11988911 119889
119896119896)
then (D minus 119899XX1015840) cannot be diagonal as Reference inwhich case the conventional VIF
119888s are tenuous
(C3) Conventional VIF119906s based on uncentered X in 119872
0
with X = 0 do not gage variance inflation as claimedfounded on the false tenet that X1015840
0X0can be diagonal
(C4) To detect influential observations and to classify highleverage points casendashinfluence diagnostics namely119891119895119868
= VIF119895(minus119868)
VIF119895 1 le 119895 le 119896 are studied in [27]
for assessing the impact of subsets on variance infla-tion Here VIF
119895is from the full data and VIF
119895(minus119868)on
deleting observations in the index set 119868 Similarly [28]proposes using VIF
(119894)on deleting the 119894th observation
In the present context these would gain substance onmodifying VF
119907and VF
119888accordingly
5 Case Study 1 Continued
51 The Setting We continue an elementary and transparentexample to illustrate essentials Recall119872
0 119884119894= 1205730+12057311198831+
12057321198832+120598119894 of Section 22 the designX
0= [15X1X2] of order
(5 times 3) and U = X10158400X0 and its inverse V = (X10158400X0)minus1 as in
expressions (1) The design is neither 119881perp nor 119862perp orthogonalsince neither the lower right (2 times 2) block of X1015840
0X0nor the
centered Z1015840Z is diagonal Moreover the uncentered VIF119906s as
listed in (3) are not the vaunted relative increases in variancesowing to nonorthogonal columns of X
0 Indeed the only
opportunity for 119881perp-orthogonality here is between columns
[X1X2]
Nonetheless from Section 41 we utilize Theorem 4(i)and (10) to connect the collinearity indices of [9] to anglesbetween subspaces Specifically Minitab recovered valuesfor the residuals RS
119895= x
119895minus P119895x1198952 119895 = 0 1 2 fur-
ther computations proceed routinely as listed in Table 1 Inparticular the principal angle 120579
0between the constant vector
and the span of the regressor vectors is 1205790= 31751 deg as
anticipated in Remark 7
52 119881perp-Orthogonality For subsequent reference let 119863(119909) bea design as in (1) but with second-moment matrix
U (119909) = [
[
5 3 1
3 25 119909
1 119909 3
]
]
(18)
8 Advances in Decision Sciences
Table 1 Values for residuals RS119895= x119895minus P119895x1198952 for X
1198952 for 1205812
119906119895= VI F
119906(120573119895) for (1 minus 1120581
2
119906119895)12 and for 120579
119895 for 119895 = 0 1 2
119895 RS119895
X1198952
1205812
119895(1 minus 1120581
2
119906119895)12
120579119895(deg)
0 13846 50 36111 08503 317511 06429 25 38889 08619 304702 25714 30 11667 03780 67792
Invoking Definition 8 we seek a 119881perp-orthogonal Reference
with moment matrix U(0) having X10158401X2
= 0 that is 119863(0)This is found constructively on retaining [1
5 ∙X2] in X
0
but replacing X1by X01
= [1 05 05 1 0]1015840 as listed in
Table 2 giving the design 119863(0) with columns [15X01X2] as
ReferenceAccordingly R
119881= U(0) is ldquoas 119881
perp-orthogonal as it cangetrdquo given the lengths and sums of (X
1X2) as prescribed
in the experiment and as preserved in the Reference modelLemma 9 with X=[06 02]
1015840 and D= Diag(25 3) givesX1015840Dminus1X = 0157333 and 120574 = 5(1 minus 5X1015840Dminus1X) =
234375 Applications of Lemma 9(iii)-(iv) in succession givethe reference variances
Var (1205730| X1015840X = D) = [5 (1 minus 5X1015840Dminus1X)]
minus1
= 09375
Var (1205731|X1015840X=D)=
1
11988911
[1+1205741198832
1
11988911
]=2
5[1+
072120574
5]=17500
Var (1205732|X1015840X=D)=
1
11988922
[1+1205741198832
2
11988922
]=1
3[1+
004120574
3]=04375
(19)
As Reference these combine with actual variances from119881 = [119881
119894119895] at (1) giving VF
119907for the original designX
0relative
to R119881 For example VF
119907(1205730) = 0722209375 = 07704 with
11988111
= 07222 from (1) and 09375 from (19) This contrastswith VIF
119906(1205730) = 36111 in (3) as in Section 22 and Table 2
it is noteworthy that the nonorthogonal design 119863(1) yieldsuniformly smaller variances than the 119881
perp-orthogonal R119881
namely119863(0)Further versions 119863(119909) 119909 isin [minus05 00 05 06 10 15]
of our basic design are listed for comparison in Table 2 where119909 = X1015840
1X2for the various designs The designs themselves are
exhibited on transposing rowsX10158401= [11988311 11988312 11988313 11988314 11988315]
from Table 2 and substituting each into the template X00
=
[15 ∙X2] The design 119863(2) was constructed but not listed
since its X10158400X0is not invertible Clearly these are actual
designs amenable to experimental deployment
53 119862perp-Orthogonality Continuing X0
= [15X1X2] and
invoking Definition 12(ii) we seek a 119862perp-orthogonal
Reference having the matrix C119908
as diagonal This is
identified as 119863(06) in Table 2 From this the matrix R119862and
its inverse are
R119862= [
[
5 3 1
3 25 06
1 06 3
]
]
Rminus1119862
= [
[
07286 minus08572 minus00714
minus08571 14286 00
minus00714 00 03571
]
]
(20)
The variance factors are listed in (4) where for exampleVF119888(1205732) = 0388903571 = 10889 As distinct from conven-
tional VIF119888s for 120573
1and 120573
2only our VF
119888(1205730) here reflects
Variance Deflationwherein 1205730is estimated more precisely in
the initial 119862perp-nonorthogonal designA further note on orthogonality is germane Suppose
that the actual experiment is 119863(0) yet towards a thoroughdiagnosis the user evaluates the conventional VIF
119888(1205731) =
12250 = VIF119888(1205732) as ratios of variances of 119863(0) to 119863(06)
from Table 2 Unfortunately their meaning is obscured sincea design cannot at once be 119881perp and 119862
perp-orthogonalAs in Remark 2 we next alter the basic designX
0at (1) on
shifting columns of themeasurements to [0 0] asminima andscaling to have squared lengths equal to 119899 = 5 The resultingX0follows directly from (1) The new matrix U = X
1015840
0X0 andits inverse V are
U = [
[
5 42426 42426
42426 5 4
42426 4 5
]
]
V = [
[
1 minus04714 minus04714
minus04714 07778 minus02222
minus04714 minus02222 07778
]
]
(21)
giving conventional VIF119906s as [5 38889 38889] Against 119862perp-
orthogonality this gives the diagnostics
[VF119888(1205730) VF119888(1205731) VF119888(1205732)] = [08140 10889 10889]
(22)
demonstrating in comparison with (4) that VF119888s apart from
1205730 are invariant under translation and scaling a consequence
of Lemma 14We further seek to compare variances for the shifted
and scaled (X1X2) against 119881perp-orthogonality as Reference
However from U = X10158400X0at (21) we determine that 119883
1=
084853 = 1198832and 5[(119883
2
15) + (119883
2
25)] = 14400 Since
the latter exceeds unity we infer from Lemma 9(i) that
Advances in Decision Sciences 9
Table 2 Designs 119863(119909) 119909 isin [minus05 00 05 06 10 15] obtained by substitutingX01= [11988311 11988312 11988313 11988314 11988315]1015840 in [15 ∙X2] of order (5times3)
and corresponding variances Var (1205730)Var (120573
1)Var (120573
2)
Design 11988311
11988312
11988313
11988314
11988315
Var (1205730) Var (120573
1) Var (120573
2)
119863(minus05) 10 00 05 10 05 19333 37333 09333119863(00) 10 05 05 10 00 09375 17500 04375119863(05) 05 10 00 10 05 07436 14359 03590119863(06) 06091 05793 06298 11819 0000 07286 14286 03570119863(10) 00 05 05 10 10 07222 15556 03889119863(15) 00 10 05 05 10 09130 24348 06087
119881perp-orthogonality is incompatible with this configuration
of the data so that the VF119907s are undefined Equivalently
on evaluating 1206011
= 1206012
= 2290 deg and 1206011+ 1206012
=
4580 lt 90 deg Corollary 10 asserts that R119881is not positive
definite This appears to be anomalous until we recallthat 119881perp-orthogonality is invariant under rescaling but notrecentering the regressors
54 A Critique We reiterate apparent ambiguity ascribedin the literature to orthogonality A number of widely heldguiding precepts has evolved as enumerated in part in ouropening paragraph and in Section 3 As in Remark 3 theseclearly have been taken to apply verbatim forX
0andX1015840
0X0in
1198720 to be paraphrased in part as follows(P1) Ill-conditioning espouses inflated variances that is
VIFs necessarily equal or exceed unity(P2) 119862perp-orthogonal designs are ldquoidealrdquo in that VIF
119888s for
such designs are all unity see [11] for exampleWe next reassess these precepts as they play out under119881perp and 119862
perp orthogonality in connection with Table 2(C1) For models in 119872
0having X = 0 we reiterate that
the uncentered VIF119906s at expression (3) namely
[36111 38889 11667] overstate adverse effects onvariances of the 119881perp-nonorthogonal array since X1015840
0X0
cannot be diagonal Moreover these conventionalvalues fail to discern that the revised values for VF
119907s
at (5) namely [07704 08889 08889] reflect that[1205730 1205731 1205732] are estimated with greater efficiency in the
119881perp-nonorthogonal array119863(1)
(C2) For 119881perp-orthogonality the claim P1 thus is false by
counterexampleTheVFs at (5) areVarianceDeflationFactors for design 119863(1) where X1015840
1X2= 1 relative to
the 119881perp-orthogonal119863(0)(C3) Additionally the variances Var[120573
0(119909)] Var[120573
1(119909)]
Var[1205732(119909)] in Table 2 are all seen to decrease from
119863(0) [09375 17500 04375] rarr 119863(1) [0722215556 03889] despite the transition from the 119881
perp-orthogonal 119863(0) to the 119881
perp-nonorthogonal 119863(1)Similar trends are confirmed in other ill-conditioneddata sets from the literature
(C4) For 119862perp-orthogonal designs the claim P2 is false by
counterexample Such designs need not be ldquoidealrdquoas 1205730may be estimated more efficiently in a 119862
perp-nonorthogonal design as demonstrated by the VF
119888s
for 1205730at (4) and (22) Similar trends are confirmed
elsewhere as seen subsequently
(C5) VFs for 1205730are critical in prediction where prediction
variances necessarily depend on Var(1205730) especially
for predicting near the origin of the system of coor-dinates
(C6) Dissonance between 119881perp and 119862
perp is seen in Table 2where 119863(06) as 119862
perp-orthogonal with X10158401X2
= 06is the antithesis of 119881perp-orthogonality at 119863(0) whereX10158401X2= 0
(C7) In short these transparent examples serve to dispelthe decades-old mantra that ill-conditioning neces-sarily spawns inflated variances formodels in119872
0 and
they serve to illuminate the contributing structures
6 Orthogonal and Linked Arrays
A genuine 119881perp-orthogonal array was generated as eigenvec-
tors E = [E1E2 E
8] from a positive definite (8 times 8)
matrix (see Table 83 of [11 p377]) using PROC IML of theSAS programming package The columns [X
1X2X3] as the
second through fourth columns of Table 3 comprise the firstthree eigenvectors scaled to length 8 These apply for modelsin 119872119908and 119872
0to be analyzed In addition linked vectors
(Z1Z2Z3) were constructed as Z
1= 119888(09E
1+ 01E
2)
Z2
= 119888(09E2+ 01E
3) and Z
3= 119888(09E
3+ 01E
4) where
119888 = radic8082 Clearly these arrays as listed in Table 3 are notarchaic abstractions as both are amenable to experimentalimplementation
61 The Model 1198720 We consider in turn the orthogonal and
the linked series
611 Orthogonal Data Matrices X10158400X0and (X1015840
0X0)minus1 for the
orthogonal data under model1198720are listed in Table 4 where
variances occupy diagonals of (X10158400X0)minus1 The conventional
uncentered VIF119906s are [120296 102144 112256 105888]
Since X = 0 we find the angle between the constant andthe span of the regressor vectors to be 120579
0= 65748 deg as
in Theorem 4(ii) Moreover the angle between X1and the
span of [1119899X2X3] namely 120579
1= 81670 deg is not 90 deg
because of collinearity with the constant vector despite themutual orthogonality of [X
1X2X3] Observe here thatX1015840
0X0
10 Advances in Decision Sciences
Table 3 Basic 119881perp-orthogonal design X0 = [18 X1 X2 X3] and a linked design Z0 = [18 Z1 Z2 Z3] of order (8 times 4)
Orthogonal (X1X2X3) Linked (Z
1Z2Z3)
18
X1
X2
X3
18
Z1
Z2
Z3
1 minus1084470 0056899 0541778 1 minus1071550 0116381 06534861 minus1090010 minus0040490 0117026 1 minus1087820 minus0027320 01160861 0979050 0002564 2337454 1 0973345 0260677 21996871 minus1090230 0016839 0170711 1 minus1081700 0035588 00706941 0127352 2821941 minus0071670 1 0438204 2796767 minus00707101 1045409 minus0122670 minus1476870 1 1025468 minus0285010 minus16185601 1090803 minus0088970 0043354 1 1074306 minus0083640 01769281 1090739 minus0092300 0108637 1 1073875 minus0079740 0244567
Table 4 Matrices X10158400X0 and (X10158400X0)minus1 for the orthogonal data under model119872
0
X10158400X0
(X10158400X0)minus1
8 10686 25538 17704 01504 minus00201 minus00480 minus0033310686 8 0 0 minus00201 01277 00064 0004425538 0 8 0 minus00480 00064 01403 0010617704 0 0 8 minus00333 00044 00106 01324
already is 119881perp-orthogonal accordingly R119881= X10158400X0 and thus
the VF119907s are all unity
In view of dissonance between 119881perp and 119862
perp orthogonalitythe sums of squares and products for the mean-centered Xare
X1015840B119899X = [
[
78572 minus03411 minus02365
minus03411 71847 minus05652
minus02365 minus05652 76082
]
]
(23)
reflecting nonnegligible negative dependencies to distin-guish the 119881
perp-orthogonal X from the 119862perp-nonorthogonal
matrix Z = B119899X of deviations
To continue we suppose instead that the 119881perp-orthogonalmodel was to be recast as 119862
perp-orthogonal The Referencemodel and its inverse are listed in Table 5 Variance factorsas ratios of diagonal elements of (X1015840
0X0)minus1 in Table 4 to those
of Rminus1119862
in Table 5 are
[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]
= [10168 10032 10082 10070]
(24)
reflecting negligible differences in precision This parallelsresults reported in Table 2 comparing 119863(0) to 119863(06) asReference except for larger differences in Table 2 namely[128681225012250]
612 Linked Data For the Linked Data under model 1198720
the matrix Z10158400Z0 follows routinely from Table 3 its inverse(Z10158400Z0)minus1 is listed in Table 6 The conventional uncentered
VIF119906s are now [120320 103408 113760 105480] These
again fail to gage variance inflation since Z10158400Z0cannot be
diagonal From (10) the principal angle between the constantvector and the span of the regressor vectors is 120579
0= 65735
degrees
Remark 15 Observe that the choice for R119862may be posed as
seeking R119862
= U(119909) such that Rminus1119862(2 3) = 00 then solving
numerically for 119909 using Maple for example However thealgorithm in Definition 12(ii) affords a direct solution thatC119908should be diagonal stipulates its off-diagonal element as
X10158401X2minus 35 = 0 so that X1015840
1X2= 06 in R
119862at expression (20)
To illustrate Theorem 4(iii) we compute cos(120579) = (1 minus
110889)12 = 02857 and 120579 = 73398 deg as the angle between
the vectors [X1X2] of (1) when centered to their means For
slopes in 1198720 [9] shows that VIF
119888(120573119895) lt VIF
119906(120573119895) 1 le
119894 le 119896 This follows since their numerators are equal butdenominators are reciprocals of lengths of the centered anduncentered regressor vectors To illustrate numerators areequal for VIF
119906(1205733) = 03889(13) = 11667 and for
VIF119888(1205733) = 03889(128) = 10889 but denominators are
reciprocals of X10158402X2= 3 and sum
5
119895=1(1198832119895
minus 1198832)2= 28
(119894) 119881perp Reference Model To rectify this malapropism we seek
in Definition 8 a model R119881as Reference This is found on
setting all off-diagonal elements of Z10158400Z0to zero excluding
the first row and column Its inverse Rminus1119881
is listed in Table 6The VF
119907s are ratios of diagonal elements of (Z1015840
0Z0)minus1 on the
left to diagonal elements of Rminus1119881
on the right namely
[VF119907(1205730) VF119907(1205731) VF119907(1205732) VF119907(1205733)]
= [09697 09991 09936 09943]
(25)
Against conventional wisdom it is counter-intuitive that[1205730 1205731 1205732 1205733] are all estimated with greater precision in the
119881perp-nonorthogonal model Z1015840
0Z0than in the 119881
perp-orthogonalmodel R
119881of Definition 8 As in Section 54 this serves again
to refute the tenet that ill-conditioning espouses inflated
Advances in Decision Sciences 11
Table 5 Matrices R119862and Rminus1
119862for checking 119862
perp-orthogonality in the orthogonal data X0 under model1198720
R119862
Rminus1119862
8 10686 25538 17704 01479 minus00170 minus00444 minus0029110686 8 03411 02365 minus00170 01273 0 025538 03411 8 05652 minus00444 0 01392 017704 02365 05652 8 minus00291 0 0 01314
Table 6 Matrices (Z10158400Z0)minus1 and Rminus1
119881for checking 119881
perp-orthogonality for the linked data under model1198720
(Z10158400Z0)minus1 Rminus1
119881
01504 minus00202 minus00461 minus00283 01551 minus00261 minus00530 minus00344minus00202 01293 minus000797 00053 minus00261 01294 00089 00058minus00461 minus00079 01422 minus00054 minus00530 00089 01431 00117minus00283 00053 minus00054 01319 minus00344 00058 00117 01326
variances for models in1198720 that is that VFs necessarily equal
or exceed unity(119894119894) 119862
perp Reference Model Further consider variances werethe Linked Data to be centered As in Definition 12(ii)we seek a Reference model R
119862such that C
119908= (W minus
M) is diagonal The required R119862
and its inverse arelisted in Table 7 Since variances in the Linked Data are[015040 012926 014220 013185] from Table 6 and Refer-ence values appear as diagonal elements of Rminus1
119862on the right
in Table 7 their ratios give the centered VF119888s as
[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]
= [09920 10049 10047 10030]
(26)
We infer that changes in precision would be negligible ifinstead the Linked Data experiment was to be recast as a 119862perp-orthogonal experiment
As in Remark 15 the reference matrix R119862is constrained
by the first and second moments from X10158400X0and has free
parameters 1199091 1199092 1199093 for the (2 3) (2 4) (3 4) entries
The constraints for 119862perp-orthogonality are Rminus1
119862(2 3) =
Rminus1119862(2 4) = Rminus1
119862(3 4) = 0 Although this system of
equations can be solved with numerical software such asMaple the algorithm stated in Definition 12 easily yields thedirect solution given here
62 The Model 119872119908 For the Linked Data take 119884
119894= 12057311198851198941+
12057321198851198942
+ 12057331198851198943
+ 120598119894 1 le 119894 le 8 as Y = Z120573 + 120598
where the expected response at 1198851
= 1198852
= 1198853
= 0
is 119864(119884) = 00 and there is no occasion to translate theregressorsThe lower right (3 times 3) submatrix ofZ1015840
0Z0(4 times 4)
from 1198720is Z1015840Z its inverse is listed in Table 8 The ratios
of diagonal elements of (Z1015840Z)minus1 to reciprocals of diagonalelements of Z1015840Z are the conventional uncentered VIF
119906s
namely [10123 10247 10123] Equivalently scaling Z1015840Z tohave unit diagonals and inverting gives Cminus1 as in Table 8 itsdiagonal elements are the VIF
119906sThroughout Section 6 these
comprise the only correct interpretation of conventionalVIF119906s as genuine variance ratios
7 Case Study Body Fat Data
71 The Setting Body fat and morphogenic measurementswere reported for 119899 = 20 human subjects in Table D11 of[29 p717] Response and regressor variables are119884 amount ofbody fat X
1 triceps skinfold thickness X
2 thigh circumfer-
ence and X3 mid-arm circumference under119872
0 119884119894= 1205730+
12057311198831+12057321198832+12057331198833+ 120598119894 From the original data as reported
the condition number of X10158400X0is 1039931 and the VIF
119906s are
[2714800 4469419 630998 778703] On scaling columnsof X to equal column lengths X
119894 = 50 119894 = 1 2 3 the
resulting condition number of the revisedX10158400X0is 2 9696518
and the uncenteredVIF119906s are as before from invariance under
scalingAgainst complaints that regressors are remote from their
natural origins we center regressors to [0 0 0] as origin onsubtracting the minimal element from each column as inRemark 2 and scaling these to X
119894 = 50 119894 = 1 2 3
This is motivated on grounds that centered diagnostics apartfrom VIF
119888(1205730) are invariant under shifts and scalings from
Lemma 14 In addition under the new origin [0 0 0] 1205730
assumes prominence as the baseline response against whichchanges due to regressors are to be gaged The original dataare available on the publisherrsquos web page for [29]
Taking these shifted and scaled vectors as columnsof X = [X
1X2X3] in the revised X
0= [1
119899X]
values for X10158400X0
and its inverse are listed in Table 9The condition number of X1015840
0X0
is now 1136969 andits VIF
119906s are [6775617998742782174484] Additionally
from Theorem 4(ii) we recover the angle 1205790= 22592 deg to
quantify collinearity of regressors with the constant vectorIn addition the scaled and centered correlationmatrix for
X is
R = [
[
10 00847 08781
00847 10 01424
08781 01424 10
]
]
(27)
indicating strong linkage between mean-centered versions of(X1X3)Moreover the experimental design is neither119881perp nor
119862perp-orthogonal since neither the lower right (3 times 3) block of
X10158400X0nor the centered matrixC = (X1015840Xminus119899XX1015840) is diagonal
12 Advances in Decision Sciences
Table 7 Matrices R119862and Rminus1
119862for checking 119862
perp-orthogonality in the linked data under model1198720
R119862
Rminus1119862
8 13441 27337 17722 01516 minus00216 minus00484 minus0029113441 8 04593 02977 minus00216 01286 0 027337 04593 8 06056 minus00484 0 01415 017722 02977 06056 8 minus00291 0 0 01314
Table 8 Matrices (Z1015840Z)minus1 and Cminus1 for the linked data under model119872119908
(Z1015840Z)minus1 Cminus1
01265 minus00141 00015 10123 minus01125 00123minus00141 01281 minus00141 minus01125 10247 minus0112500015 minus00141 01265 00123 minus01125 10123
(119894) 119881perp Reference Model To compare this with a 119881
perp-or-thogonal design we turn again to Definition 8 Howeverevaluating the test criterion in Lemma 9(119894) shows that thisconfiguration is not commensurate with 119881
perp-orthogonalityso that the VF
119907s are undefined Comparisons with 119862
perp-orthogonal models are addressed next(119894119894) 119862perp Reference Model As in Definition 12(ii) the centering
matrix is
119899XX1015840 = [
[
188889 189402 187499
189402 189916 188007
187499 188007 186118
]
]
(28)
To check against 119862perp-orthogonality we obtain the matrix Wof Definition 12(ii) on replacing off-diagonal elements of thelower right (3 times 3) submatrix ofX1015840
0X0by corresponding off-
diagonal elements of 119899XX1015840 The result is the matrix R119862and
its inverse as given in Table 10 where diagonal elements ofRminus1119862
are the Reference variances The VF119888s found as ratios
of diagonal elements of (X10158400X0)minus1 relative to those of Rminus1
119862
are listed in Table 12 under G = [0 0 0] the indicator ofDefinition 12(i) for this case
72 Variance Factors and Linkage Traditional gages of ill-conditioning are patently absurd on occasion By conventionVIFs are ldquoall or nonerdquo wherein Reference models entailstrictly diagonal components In practice some regressorsare inextricably linked to get or even visualize orthogonalregressors may go beyond feasible experimental rangesrequire extrapolation beyond practical limits and challengecredulity Response functions so constrained are described in[30] as ldquopicket fencesrdquo In such cases taking C to be diagonalas its 119862perp Reference is moot at best an academic folly abettedin turn by default in standard software packages In shortconventional diagnostics here offer answers to irrelevantquestions Given pairs of regressors irrevocably linked weseek instead to assess effects on variances for other pairsthat could be unlinked by design These comments applyespecially to entries under G = [0 0 0] in Table 12
To proceed we define essential ill-conditioning as regres-sors inextricably linked necessarily to remain so and asnonessential if regressors could be unlinked by design As a
followup to Section 32 for X = 0 we infer that the constantvector is inextricably linked with regressors thus accountingfor essential ill-conditioning not removed by centeringcontrary to claims in [4] Indeed this is the essence ofDefinition 8
Fortunately these limitations may be remedied throughReference models adapted to this purpose This in turnexploits the special notation and trappings of Definition 12(i)as follows In addition toG = [0 0 0] we iterate for indicatorsG = [119892
12 11989213 11989223] taking values in [1 0 0] [0 1 0] [0 0 1]
[1 1 0] [1 0 1] [0 1 1] Values of VF119888s thus obtained are
listed in Table 12 To fix ideas the Reference R119862for G =
[1 0 0] that is for the constraint R119862(2 3) = X1015840
0X0(2 3) =
194533 is given in Table 11 together with its inverse Foundas ratios of diagonal elements of (X1015840
0X0)minus1 relative to those of
Rminus1119862
in Table 11 VF119888s are listed in Table 12 underG = [1 0 0]
In short R119862is ldquoas 119862
perp-orthogonal as it could berdquo given theconstraint Other VF
119888s in Table 12 proceed similarly For all
cases in Table 12 excluding G = [0 1 1] 1205730is estimated
with greater efficiency in the original model than any of thepostulated Reference models
As in Remark 15 the reference matrix R119862
for G =
[1 0 0] is constrained by X10158401X2
= 194533 to be retainedwhereas X1015840
1X3
= 1199091and X1015840
2X3
= 1199092are to be determined
from Rminus1119862(2 4) = Rminus1
119862(3 4) = 0 Although this system
can be solved numerically as noted the algorithm stated inDefinition 12 easily yields the direct solution given here
Recall 1198831 triceps skinfold thickness 119883
2 thigh cir-
cumference and 1198833 mid-arm circumference and from
(27) that (1198831 1198833) are strongly linked Invoking the cutoff
rule [24] for VIFs in excess of 40 it is seen for G =
[0 0 0] in Table 12 that VF119888(1205731) and VF
119888(1205733) are excessive
where [X1X2X3] are forced to be mutually uncorrelated
as Reference Remarkably similar values are reported forG isin [1 0 0] [ 0 0 1] [1 0 1] all gaged against intractableReference models wherein (119883
1 1198833) are postulated to be
uncorrelated however incrediblyOn the other hand VF
119888s for G isin [0 1 0] [1 1 0]
[0 1 1] in Table 12 are likewise comparable where (X1X3)
are allowed to remain linked at X10158401X3= 242362 Here the
VF119888s reflect negligible changes in efficiency of the estimates
Advances in Decision Sciences 13
Table 9 Matrices X10158400X0 and (X10158400X0)minus1 for the body fat data under model119872
0centered to minima and scaled to equal lengths
X10158400X0
(X10158400X0)minus1
20 194365 194893 192934 03388 minus01284 minus01482 minus00203194365 25 194533 242362 minus01284 07199 00299 minus06224194893 194533 25 196832 minus01482 00299 01711 minus00494192934 242362 196832 25 minus00203 minus06224 minus00494 06979
Table 10 Matrices R119862and Rminus1
119862for the body fat data adjusted to give off-diagonal elements the value 00 with code G = [0 0 0]
R119862
Rminus1119862
20 194365 194893 192934 05083 minus01590 minus01622 minus01510194365 25 189402 187499 minus01590 01636 0 0194893 189402 25 188007 minus01622 0 01664 0192934 187499 188007 25 minus01510 0 0 01565
in comparison with the originalX10158400X0 where [X
1X2X3] are
all linked In summary negligible changes in overall efficiencywould accrue on recasting the experiment so that (X
1X2)
and (X2X3) were pairwise uncorrelated
Table 12 conveys further useful information on notingthat VFs as ratios are multiplicative and divisible Forexample take the model codedG = [1 1 0] whereX1015840
1X2and
X10158401X3are retained at their initial values under X1015840
0X0 namely
194533 and 242362 from Table 9 with (X2X3) unlinked
Now taking G = [0 1 0] as reference we extract the VFson dividing elements in Table 12 at G = [0 1 0] by those ofG = [1 1 0] The result is
[VF (1205730) VF (120573
1) VF (120573
2) VF (120573
3)]
= [09196
0984810073
1005710282
1020810208
10208]
= [09338 10017 10072 10000]
(29)
8 Conclusions
Our goal is clarity in the actual workings of VIFs as ill-conditioning diagnostics thus to identify and rectifymiscon-ceptions regarding the use of VIFs for models X
0= [1119899X]
with intercept Would-be arbiters for ldquocorrectrdquo usage havedivided between VIF
119888s for mean-centered regressors and
VIF119906s for uncentered data We take issue with both Conven-
tional but clouded vision holds that (i) VIF119906s gage variance
inflation owing to nonorthogonal columns of X0as vectors
in R119899 (ii) VIFs ge 10 and (iii) models having orthogonalmean-centered regressors are ldquoidealrdquo in possessing VIF
119888s of
value unity Reprising Remark 3 these properties widely andcorrectly understood for models in119872
119908 appear to have been
expropriated without verification for models in1198720hence the
anomaliesAccordingly we have distinguished vector-space orthog-
onality (119881perp) of columns of X0 in contrast to orthogonality
(119862perp) of mean-centered columns of X A key feature is theconstruction of second-moment arrays as Reference models(R119881R119862) encoding orthogonalities of these types and their
Variance Factors as VF119907rsquos and VF
119888rsquos respectively Contrary
to convention we demonstrate analytically and through casestudies that (i1015840) VIF
119906s do not gage variance inflation (ii1015840)
VFs ge 10 need not hold whereas VF lt 10 representsVariance Deflation and (iii1015840) models having orthogonalcentered regressors are not necessarily ldquoidealrdquo In particularvariance deflation occurs when ill-conditioned data yieldsmaller variances than corresponding orthogonal surrogates
In short our studies call out to adjudicate the prolongeddispute regarding centered and uncentered diagnostics formodels in119872
0 Our summary conclusions follow
(S1) The choice of a model with or without intercept isa substantive matter in the context of each exper-imental paradigm That choice is fixed beforehandin a particular setting is structural and is beyondthe scope of the present study Snee and Marquardt[10] correctly assert that VIFs are fully creditedand remain undisputed in models without interceptthese quantities unambiguously retain the meaningsoriginally intended namely as ratios of variances togage effects of nonorthogonal regressors For modelswith intercept the choice is between centered anduncentered diagnostics which is to be weighed asfollows
(S2) ConventionalVIF119888s apply incompletely to slopes only
not necessarily grasping fully that a system is illconditioned Of particular concern is missing evi-dence on collinearity of regressors with the interceptOn the other hand if 119862
perp-orthogonality pertainsthen VF
119888s may supplant VIF
119888s as applicable to all of
[1205730 1205731 120573
119896] Moreover the latter may reveal that
VF119888(1205730) lt 10 as in cited examples and of special
import in predicting near the origin of the coordinatesystem
(S3) Our VF119907s capture correctly the concept apparently
intended by conventional VIF119906s Specifically if 119881perp-
orthogonality pertains then VF119907s as genuine ratios
of variances may supplant VIF119906s as now discredited
gages for variance inflation(S4) Returning to Section 32 we subscribe fully but for
different reasons to Belsleyrsquos and othersrsquo contention
14 Advances in Decision Sciences
Table 11 Matrices R119862and Rminus1
119862for the body fat data adjusted to give off-diagonal elements the value 00 except X10158401X2 code G = [1 0 0]
R119862
Rminus1119862
20 194365 194893 192934 04839 minus01465 minus01497 minus01510194365 25 194533 187499 minus01465 01648 minus00141 0194893 194533 25 188007 minus01497 minus00141 01676 0192934 187499 188007 25 minus01510 0 0 01565
Table 12 Summary VF119888s for the body fat data centered to minima and scaled to common lengths identified with varying G = [119892
12 11989213 11989223]
from Definition 12
Elements of G VF119888s
11989212
11989213
11989223
1205730
1205731
1205732
1205733
0 0 0 06665 43996 10282 44586
1 0 0 07002 43681 10208 44586
0 1 0 09196 10073 10282 10208
0 0 1 07201 43996 10073 43681
1 1 0 09848 10057 10208 10208
1 0 1 07595 43681 10003 43681
0 1 1 10248 10073 10073 10160
that uncentered VIF119906s are essential in assessing ill-
conditioned data In the geometry of ill-conditioningthese should be retained in the pantheon of regres-sion diagnostics not as now debunked gages ofvariance inflation but as more compelling angularmeasures of the collinearity of each column of X
0=
[1119899X1X2 X
119896] with the subspace spanned by
remaining columns(S5) Specifically the degree of collinearity of regressors
with the intercept is quantified by 1205790deg which
if small is known to ldquocorrupt the estimates of allparameters in the model whether or not the interceptis itself of interest and whether or not the data havebeen centeredrdquo [21 p90]This in turnwould appear toelevate 120579
0in importance as a critical ill-conditioning
diagnostic(S6) In contrast Theorem 4(iii) identifies conventional
VIF119888s with angles between a centered regressor and
the subspace spanned by remaining centered regres-sors For example
1205791= arccos([
[
1 minus1
VIFc ( ldquo1205731)]
]
12
) (30)
is the angle between the centered vector Z1= B119899X1
and the remaining centered vectors These lie in thelinear subspace of R119899 comprising the orthogonalcomplement to 1
119899isin R119899 and thus bear on the
geometry of ill-conditioning in this subspace(S7) Whereas conventional VIFs are ldquoall or nonerdquo to
be gaged against strictly diagonal components ourReference models enable unlinking what may beunlinked while leaving other pairs of regressorslinked This enables us to assess effects on variances
attributed to allowable partial dependencies amongsome but not other regressors
In conclusion the foregoing summary points to limita-tions in modern software packages Minitab v15 119878119875119878119878 v19and 119878119860119878 v92 are standard software packages which computecollinearity diagnostics returning with the default optionsthe centered VIFs VIF
119888(120573119895) 1 le 119895 le 119896 To compute
the uncentered VIF119906s one needs to define a variable for
the constant term and perform a linear regression using theoptions of fitting without an intercept On the other handcomputations for VF
119888s and VF
119907s as introduced here require
additional if straightforward programming for exampleMaple and PROC IML of the SAS System as utilized here
References
[1] D A Belsley ldquoDemeaning conditioning diagnostics throughcenteringrdquo The American Statistician vol 38 no 2 pp 73ndash771984
[2] E R Mansfield and B P Helms ldquoDetecting multicollinearityrdquoThe American Statistician vol 36 no 3 pp 158ndash160 1982
[3] S D Simon and J P Lesage ldquoThe impact of collinearityinvolving the intercept term on the numerical accuracy ofregressionrdquo Computer Science in Economics and Managementvol 1 no 2 pp 137ndash152 1988
[4] D W Marquardt and R D Snee ldquoRidge regression in practicerdquoThe American Statistician vol 29 no 1 pp 3ndash20 1975
[5] K N Berk ldquoTolerance and condition in regression computa-tionsrdquo Journal of the American Statistical Association vol 72 no360 part 1 pp 863ndash866 1977
[6] A E Beaton D Rubin and J Barone ldquoThe acceptability ofregression solutions another look at computational accuracyrdquoJournal of the American Statistical Association vol 71 no 353pp 158ndash168 1976
Advances in Decision Sciences 15
[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975
[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970
[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987
[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984
[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990
[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980
[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969
[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996
[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003
[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006
[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008
[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010
[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981
[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983
[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987
[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986
[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980
[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007
[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984
[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969
[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997
[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000
[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003
[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983
Advances in Decision Sciences 3
intercept deflated in comparison with the feasible centeredreference Specifically 120573
0is estimated here with greater
efficiency VF119888(1205730) = 09913 in the initial design (1) despite
nonorthogonality of its centered regressors (ii) Variances areuniformly smaller in the model (1) than in its feasible uncen-tered reference from (5) thus exhibiting Variance Deflationdespite nonorthogonality of the uncentered regressors Afull explication of the anatomy of this study appears laterIn support we next examine technical details needed insubsequent developments
23 Types of Orthogonality The ongoing divergence betweencentered and uncentered diagnostics traces in part to mean-ing ascribed to orthogonality Specifically the orthogonalityof columns of X in 119872
119908refers unambiguously to the vector-
space concept X119894perp X119895 that is X1015840
119894X119895
= 0 as does thenotion of collinearity of regressors with the constant vectorin 1198720 We refer to this as 119881-orthogonality in short 119881perp In
contrast nonorthogonality in 1198720often refers instead to the
statistical concept of correlation among columns of X whenscaled and centered to their means We refer to its negationas 119862-orthogonality or 119862
perp Distinguishing between thesenotions is fundamental as confusion otherwise is evident Forexample it is asserted in [11 p125] that ldquothe simple correlationcoefficient 119903
119894119895does measure linear dependency between 119909
119894
and 119909119895in the datardquo
24 The Models 1198720 Consider 119872
0 Y = X
0120579 + 120598 X
0=
[1119899X] 1205791015840 = [120573
01205731015840] together with
X10158400X0= [
119899 119899X1015840
119899X X1015840X] (X1015840
0X0)minus1
= [11988800
c01
c101584001
G ] (6)
where 119899X1015840 = 11015840119899X 11988800
= [119899 minus 1198992X1015840(X1015840X)
minus1X]minus1 C =
(X1015840Xminus119899XX1015840) =X1015840B119899X andG = Cminus1 from block-partitioned
inverses In later sections we often will denote the mean-centeringmatrix 119899XX1015840 byM In particular the centered formX rarr Z = B
119899X arises exclusively in models with intercept
with or without reparametrizing (1205730120573) rarr (120572120573) in the
mean-centered regressors where 120572 = 1205730+12057311198831+ sdot sdot sdot + 120573
119896119883119896
Scaling Z to unit column lengths gives Z1015840Z in correlationform with unit diagonals
3 Historical Perspective
Our objective is an overdue revision of the tenets of varianceinflation in regression To provide context we next surveyextenuating issues from the literature Direct quotations areintended not to subvert stances taken by the cited authorsModels in 119872
0are at issue since as noted in [10] centered
diagnostics have no place in119872119908
31 Background Aspects of ill-conditioning span a consid-erable literature over many decades Regarding Y = X120573 +120598 scaling columns of X to equal lengths approximatelyminimizes the condition number 119888
2(X) [12 p120] based on
[13] Nonetheless 1198882(X) is cast in [9] as a blunt instrument for
ill-conditioning prompting the need for VIFs and other localconstructs Stewart [9] credits VIFs in concept to Daniel andin name to Marquardt
Ill-conditioning points beyond OLS in view of difficultiescited earlier Remedies proffered in [14 15] include trans-forming variables adding new data and deleting variable(s)after checking critical features of the reduced model Otherintended palliatives include Ridge and Partial Least Squaresas compared in [16] Principal Components regression Sur-rogate models as in [17] All are intended to return reducedstandard errors at the expense of bias Moreover Surrogatesolutions more closely resemble those from an orthogonalsystem than Ridge [18] Together the foregoing and otheroptions comprise a substantial literature as collateral to butapart from the present study
32 To Center Advocacy for centering includes the follow-ing
(i) VIFs often are defined categorically as the diagonalsof the inverse of the correlation matrix of scaled andcentered regressors see [4 9 11 19] for exampleThese are VIF
119888s widely adopted without justification
as default to the exclusion of VIF119906s
(ii) It is asserted [4] that ldquocentering removes the nones-sential ill-conditioning thus reducing the varianceinflation in the coefficient estimatesrdquo
(iii) Centering is advocated when predictor variables arefar removed from origins on the basic data scales [1011]
33 Not to Center Advocacy for uncentered diagnosticsincludes the following caveats from proponents of centering
(i) Uncentered data should be examined only if an esti-mate of the intercept is of interest [9 10 20]
(ii) ldquoIf the domain of prediction includes the full rangefrom the natural origin through the range of data thecollinearity diagnostics should not bemean centeredrdquo[10 p84]
Other issues against centering derive in part from numer-ical analysis and work by Belsley
(i) Belsley [1] identifies 1198882(X) for a system Y = X120573+120598 as
ldquothe potential relative change in the LS solution b thatcan result from a small relative change in the datardquo
(ii) These require structurally interpretable variables asldquoones whose numerical values and (relative) differ-ences derive meaning and interpretability fromknowledge of the structure of the underlying lsquorealityrsquobeing modeledrdquo [1 p75]
(iii) ldquoThere is no such thing as lsquononessentialrsquo ill-condi-tioningrdquo and ldquomean-centering can remove from thedata the information needed to assess conditioningcorrectlyrdquo [1 p74]
(iv) ldquoCollinearity with the intercept can quite generallycorrupt the estimates of all parameters in the model
4 Advances in Decision Sciences
whether or not the intercept is itself of interest andwhether or not the data have been centeredrdquo [21 p90]
(v) An example [22 p121] gives severely ill-conditioneddata perfectly conditioned in centered form ldquocenter-ing alters neitherrdquo inflated variances nor extremelysensitive parameter estimates in the basic data more-over ldquodiagnosing the conditioning of the centereddata (which are perfectly conditioned) would com-pletely overlook this situation whereas diagnosticsbased on the basic data would notrdquo
(vi) To continue from [22] ill-conditioning persists inthe propagation of disturbances in that ldquoa 1 percentrelative change in theX
119894rsquos results in over a 40 relative
change in the estimatesrdquo despite perfect conditioningin centered form and ldquoknowledge of the effect ofsmall relative changes in the centered data is notmeaningful for assessing the sensitivity of the basic LSproblemrdquo since relative changes and their meaningsin centered anduncentered data oftendiffermarkedly
(vii) Regarding choice of origin ldquo(1) the investigator mustbe able to pick an origin against which small relativechanges can be appropriately assessed and (2) it is thedata measured relative to this origin that are relevantto diagnosing the conditioning of the LS problemrdquo[22 p126]
Other desiderata pertain
(i) ldquoBecause rewriting the model (in standardized vari-ables) does not affect any of the implicit estimates ithas no effect on the amount of information containedin the datardquo [23 p76]
(ii) Consequences of ill-advised diagnostics can besevere Degraded numerical accuracy traces to nearcollinearity of regressors with the constant vectorIn short centering fails to prevent a loss in numer-ical accuracy centered diagnostics are unable todiscern these potential accuracy problems whereasuncentered diagnostics are seen to work well Twowidely used statistical packages SAS and SPSS-Xfail to detect this type of ill-conditioning throughuse of centered diagnostics and thus return highlyinaccurate coefficient estimates For further detailssee [3]
On balance formodels in1198720the jury is out regarding the
use of centered or uncentered diagnostics to include VIFsEven Belsley [1] (and elsewhere) concedes circumstanceswhere centering does achieve structurally interpretable mod-els Of note is that the foregoing listed citations to Belsleyapply strictly to condition numbers 119888
2(X) other purveyors of
ill-conditioning specifically VIFs are not treated explicitly
34 A Synthesis It bears notice that (i) the origin how-ever remote from the cluster of regressors is essential forprediction and (ii) the prediction variance is invariant toparametrizing in centered or uncentered forms Additionalremarks are codified next for subsequent referral
Remark 2 Typically 119884 represents response to input vari-ables 119883
1 1198832 119883
119896 In a controlled experiment levels
are determined beforehand by subject-matter considerationsextraneous to the experiment to include minimal valuesHowever remote the origin on the basic data scales it seemsinformative in such circumstances to identify the originwith these minima In such cases the intercept is often ofsingular interest since 120573
0is then the standard against which
changes in 119884 are to be gaged as regressors vary We adopt thisconvention in subsequent studies from the archival literature
Remark 3 In summary the divergence in views whetherto center or not appears to be that critical aspects of ill-conditioning known and widely accepted for models in119872
119908
have been expropriated over decades mindlessly and withoutverification to apply point-by-point for models in119872
0
4 The Structure of Orthogonality
This section develops the foundations for Reference modelscapturing orthogonalities of types 119881
perp and 119862perp Essential
collateral results are given in support as well
41 Collinearity Indices Stewart [9] reexamines ill-condi-tioning from the perspective of numerical analysis Detailsfollow where X is a generic matrix of regressors havingcolumns x
119895 1 le 119895 le 119901 and Xdagger = (X1015840X)
minus1X1015840 is thegeneralized inverse of note having xdagger
119895 1 le 119895 le 119901 as
its typical rows Each corresponding collinearity index 120581 isdefined in [9 p72] as
120581119895=10038171003817100381710038171003817x119895
10038171003817100381710038171003817sdot10038171003817100381710038171003817xdagger119895
10038171003817100381710038171003817 1 le 119895 le 119901 (7)
constructed so as to be scale invariant Observe that xdagger1198952 is
found along the principal diagonal of [(Xdagger)(Xdagger)1015840] = (X1015840X)minus1
WhenX(119899times119896) inX0= [1119899X] is centered and scaled Section
3 of [9] shows that the centered collinearity indices 120581119888satisfy
1205812
119888119895= VIF
119888(120573119895) 1 le 119895 le 119896 In 119872
0with parameters (120573
0
120573) values corresponding to xdagger1198952 from Xdagger
0= (X10158400X0)minus1X10158400
lie along the principal diagonal of (X10158400X0)minus1 the uncentered
collinearity indices 120581119906now satisfy 120581
2
119906119895= x1198952sdot xdagger1198952
119895 =
0 1 119896 In particular since x0
= 1119899 we have 120581
2
1199060=
119899xdagger02 Moreover in119872
0it follows that the uncentered VIF
119906s
are squares of the collinearity indices that is VIF119906(120573119895) =
1205812
119906119895 119895 = 0 1 119896 Note the asymmetry that VIF
119888s exclude
the intercept in contrast to the inclusive VIF119906s That the
label Variance Inflation Factors for the latter is a misnomer iscovered in Remark 1 Nonetheless we continue the familiarnotation VIF
119906(120573119895) = 1205812
119906119895 119895 = 0 1 119896
Transcending the essential developments of [9] are con-nections between collinearity indices and angles between
Advances in Decision Sciences 5
subspaces To these ends choose a typical x119895in X0 and
rearrange X0as [x119895X[119895]] We next seek elements of
Q1015840119895(X10158400X0)minus1
Q119895= [
x1015840119895x119895
x1015840119895X[119895]
X1015840[119895]x119895X1015840[119895]X[119895]
]
minus1
(8)
as reordered by the permutation matrix Q119895 From the clock-
wise rule the (1 1) element of the inverse is
[x1015840119895(I119899minus P119895) x119895]minus1
= [x1015840119895x119895minus x1015840119895P119895x119895]minus1
=10038171003817100381710038171003817xdagger119895
10038171003817100381710038171003817
2 (9)
in succession for each 119895 = 0 1 119896 where P119895
=
X[119895][X1015840[119895]X[119895]]minus1X1015840[119895]
is the projection operator onto the sub-space S
119901(X[119895]) sub R119899 spanned by the columns of X
[119895] These
in turn enable us to connect 1205812119906119895 119895 = 0 1 119896 and similarly
for centered values 1205812119888119895 119895 = 1 119896 to the geometry of ill-
conditioning as follows
Theorem 4 For models in 1198720let VIF
119906(120573119895) = 120581
2
119906119895 119895 = 0
1 119896 be conventional uncentered VIF119906s in terms of Stew-
artrsquos [9] uncentered collinearity indices These in turn quantifythe extent of collinearities between subspaces through angles (in119889119890119892 as follows
(i) Angles between (x119895S119901(X[119895])) are given by 120579
119895=
arccos[(1minus11205812uj)12
] in succession for 119895 = 0 1 119896
(ii) In particular 1205790
= arccos[(1 minus 11205812u0)12
] quantifiesthe degree of collinearity between the regressors and theconstant vector
(iii) Similarly let B119899X = Z = [z
1 z
119896] be regressors
centered to their means rearrange as [z119895Z[119895]] and let
VIF119888(120573119895) = 120581
2
119888119895 119895 = 1 119896 be centered VIF
119888s in
terms of Stewartrsquos centered collinearity indices Thenangles (in deg) between (z
119895S119901(Z[119895])) are given by
120579119895= arccos[(1 minus 11205812cj)
12] 1 le j le k
Proof From the geometry of the right triangle formed by(x119895P119895x119895) the squared lengths satisfy x
1198952
= P119895x1198952+
x119895minus P119895x1198952 where x
119895minus P119895x1198952
= RS119895is the residual sum
of squares from the projection It follows that the principalangle between (x
119895P119895x119895) is given by
cos (120579119895)=
x1015840119895P119895x119895
10038171003817100381710038171003817x119895
10038171003817100381710038171003817sdot10038171003817100381710038171003817P119895x119895
10038171003817100381710038171003817
=
10038171003817100381710038171003817P119895x119895
1003817100381710038171003817100381710038171003817100381710038171003817x119895
10038171003817100381710038171003817
=radic1 minusRS119895
10038171003817100381710038171003817x119895
10038171003817100381710038171003817
2=radic1 minus
1
1205812119906119895
(10)
for 119895 = 0 1 119896 to give conclusion (i) Conclusion(ii) follows on specializing (x
0P0x0) with x
0= 1119899and
P0= X(X1015840X)
minus1X1015840 Conclusion (iii) follows similarly from thegeometry of the right triangle formed by (z
119895P119895z119895) where P
119895
now is the projection operator onto the subspace S119901(Z[119895]) sub
R119899 spanned by the columns of Z[119895] to complete our proof
Remark 5 Rules of thumb in common use for problematicVIFs include those exceeding 10 or even 4 see [11 24] forexample In angular measure these correspond respectivelyto 120579119895lt 18435 deg and 120579
119895lt 300 deg
42 Reference Models We seek as Reference feasible modelsencoding orthogonalities of types 119881
perp and 119862perp The keys
are as follows (i) to retain essentials of the experimentalstructure and (ii) to alter what may be changed to achieveorthogonality For a model in119872
119908with moment matrix X1015840X
our opening paragraph prescribes as reference the modelR119881
= D = Diag(11988911 119889
119901119901) as diagonal elements of
X1015840X for assessing 119881perp-orthogonality Moreover on scaling
columns of X to equal lengths R119881is perfectly conditioned
with 1198881(R119881) = 10 In addition every model in 119872
119908clearly
conforms with its Reference in the sense that R119881is positive
definite as distinct from models in1198720to follow
Consider again models in 1198720as in (6) let C = (X1015840X minus
M) withM = 119899XX1015840 as the mean-centering matrix and againletD = Diag(119889
11 119889
119896119896) comprise the diagonal elements of
X1015840X
(i) 119881perpReference ModelThe uncentered VIF119906s in119872
0 defined
as ratios of diagonal elements of (X10158400X0)minus1 to reciprocals of
diagonal elements of X10158400X0 appear to have seen exclusive
usage apparently in keepingwithRemark 3However the fol-lowing disclaimermust be registered as the formal equivalentof Remark 1
Theorem 6 Statements that conventional VIF119906s quantify var-
iance inflation owing to nondiagonalX10158400X0are false for models
in1198720having X = 0
Proof Since the Reference variances are reciprocals of diag-onal elements of X1015840
0X0 this usage is predicated on the false
postulate that X10158400X0can be diagonal for X = 0 Specifically
[1119899X] are linearly independent that is 11015840
119899X = 0 if and only
if X has been mean centered beforehand
To the contrary Gunst [25] purports to show thatVIF119906(1205730) registers genuine variance inflation namely the
price to be paid in variance for designing an experimenthaving X = 0 as opposed to X = 0 Since variances forintercepts are Var(sdot | X = 0) = 120590
2119899 and Var(sdot | X = 0) =
120590211988800
ge 1205902119899 from (6) their ratio is shown in [25] to be
1205812
1199060= 11989911988800
ge 1 in the parlance of Section 23 We concede thisto be a ratio of variances but to the contrary not a VIF sincethe parameters differ In particular Var(120573
0| X = 0) = 120590
211988800
whereas Var( | X = 0) = 1205902119899 with 120572 = 120573
0+ 12057311198831+
sdot sdot sdot + 120573119896119883119896in centered regressors Nonetheless we still find
the conventional VIF119906(1205730) to be useful for purposes to follow
Remark 7 Section 3 highlights continuing concern in regardto collinearity of regressors with the constant vectorTheorem 4(ii) and expression (10) support the use of 120579
0=
arccos[(1 minus 11205812u0)12
] as an informative gage on the extent of
6 Advances in Decision Sciences
this occurrence Specifically the smaller the angle the greaterthe extent of such collinearity
Instead of conventional VIF119906s given the foregoing dis-
claimer we have the following amended version as Referencefor uncentered diagnostics altering whatmay be changed butleaving X = 0 intact
Definition 8 Given a model in 1198720with second-moment
matrixX10158400X0 the amendedReferencemodel for assessing119881perp-
orthogonality is R119881
= [ 119899 119899X1015840
119899X D] with D = Diag(119889
11 119889
119896119896)
as diagonal elements of X1015840X provided that R119881is positive
definite We identify a model to be 119881perp-orthogonal when
X10158400X0= R119881
As anticipated a prospective R119881
fails to conform toexperimental data if not positive definite These and furtherprospects are covered in the following where 120601
119894designates
the angle between (1119899X119894) 119894 = 1 119896
Lemma 9 Take R119881
as a prospective Reference for 119881perp-
orthogonality
(i) In order that R119881maybe positive definite it is necessary
that 119899X1015840Dminus1X lt 1 that is that 119899sum119896
119894=1(1198832
119894119889119894119894) lt 1
(ii) Equivalently it is necessary thatsum119896119894=1
cos2(120601119894) lt 1with
120601119894as the angle between (1
119899Xi) 119894 = 1 119896
(iii) The Reference variance for 1205730is Var(120573
0| X1015840X = D) =
[119899(1 minus 119899X1015840Dminus1X)]minus1
(iv) The Reference variances for slopes are given by
Var (120573119894| X1015840X = D) =
1
119889119894119894
[1 +1205741198832
119894
119889119894119894
] 1 le 119894 le 119896 (11)
where 120574 = 119899(1 minus 119899X1015840Dminus1X)
Proof The clockwise rule for determinants gives |R119881| =
|D|[119899 minus 1198992X1015840Dminus1X] Conclusion (i) follows since |D| gt 0
The computation
cos (120601119894) =
11015840119899X119894
1003817100381710038171003817111989910038171003817100381710038171003817100381710038171003817X119894
1003817100381710038171003817=
11015840119899X119894
radic119899119889119894119894
= radic1198991198832
119894
119889119894119894
(12)
in parallel with (10) gives conclusion (ii) Using the clockwiserule for block-partitioned inverses the (1 1) element of Rminus1
119881
is given by conclusion (iii) Similarly the lower right block ofRminus1119881 of order (119896 times 119896) is the inverse of H = D minus 119899XX1015840 On
identifying a = b = X 120572 = 119899 and 119886lowast
119894= 119886119894119889119894119894 in Theorem
833 of [26] we have that Hminus1 = Dminus1 + 120574Dminus1XX1015840Dminus1Conclusion (iv) follows on extracting its diagonal elementsto complete our proof
Corollary 10 For the case 119896 = 2 in order that R119881maybe
positive definite it is necessary that 1206011+ 1206012gt 90 deg
Proof Beginning with Lemma 9(ii) compute
cos (1206011+ 1206012)
= cos1206011cos1206012minus sin120601
1sin1206012
= cos1206011cos1206012minus[1minus(cos2120601
1+cos2120601
2)+cos2120601
1cos21206012]12
(13)
which is lt0 when 1 minus (cos21206011+ cos2120601
2) gt 0
Moreover the matrix R119881itself is intrinsically ill conditioned
owing to X = 0 its condition number depending on X Toquantify this dependence we have the following wherecolumns of X have been standardized to common lengths|X119894| = 119888 1 le 119894 le 119896
Lemma 11 Let R119881= [ 119899 T1015840
T 119888I119896
] as in Definition 8 with T = 119899XD = 119888I
119896 and 119888 gt 0
(i) The ordered eigenvalues of R119881are [120582
1 119888 119888 120582
119901]
where 119901 = 119896 + 1 and (1205821 120582119901) are the roots of 120582 isin
[(119899 + 119888) plusmn radic(119899 minus 119888)2+ 41205912]2 and 120591
2= T1015840T
(ii) The roots are positive andR119881is positive definite if and
only if (119899119888 minus 1205912) gt 0
(iii) If R119881
is positive definite its condition number is1198881(R119881) = 1205821120582119901and is increasing in 120591
2
Proof Eigenvalues are roots of the determinantal equation
Det [(119899 minus 120582) T1015840T (119888 minus 120582) I
119896
]
= 0 lArrrArr (119888 minus 120582)119896minus1
[(119899 minus 120582) (119888 minus 120582) minus T1015840T] = 0
(14)
from the clockwise rule giving (119896 minus 1) values 120582 = 119888 and tworoots of the quadratic equation 120582
2minus (119899 + 119888)120582 + 119899119888 minus 120591
2= 0
to give conclusion (i) Conclusion (ii) holds since the productof roots of the quadratic equation is (119899119888 minus 120591
2) and the greater
root is positive Conclusion (iii) follows directly to completeour proof
(119894119894) 119862perp Reference Model As noted the notion of 119862
perp-or-thogonality applies exclusively for models in 119872
0 Accord-
ingly as Reference we seek to alter X1015840X so that the matrixcomprising sums of squares and sums of products of devia-tions from means thus altered is diagonal To achieve thiscanon of 119862
perp-orthogonality and to anticipate notation forsubsequent use we have the following
Definition 12 (i) For a model in 1198720with second-moment
matrix X10158400X0and inverse as in (6) let C
119908= (W minus M) and
identify an indicator vector G = [11989212 11989213 119892
1119896 11989223
1198922119896 119892
119896minus1119896] of order ((119896 minus 1) times (119896 minus 2)) in lexicographic
order where 119892119894119895= 0 119894 = 119895 if the (119894 119895) element of C
119908is zero
and 119892119894119895= 1 119894 = 119895 if the (119894 119895) element ofC
119908is (X1015840119894X119895minus119899119883119894119883119895)
that is unchanged from C = (X1015840X minus 119899XX1015840) = Z1015840Z withZ = B
119899X as the regressors centered to their means
Advances in Decision Sciences 7
(ii) In particular the Reference model in1198720for assessing
119862perp-orthogonality is R
119862= [ 119899 119899X
1015840
119899X W] such that C
119908= (W minus
M) and its inverse G from (6) are diagonal that is takingW = [119908
119894119895] such that 119908
119894119894= 119889119894119894 1 le 119894 le 119896 and 119908
119894119895=
119898119894119895 for all 119894 = 119895 or equivalently G = [0 0 0] In this
case we identify the model to be 119862perp-orthogonal
Recall that conventional VIF119888s for [120573
1 1205732 120573
119896] are
ratios of diagonal elements of (Z1015840Z)minus1 to reciprocals ofthe diagonal elements of the centered Z1015840Z Apparently thisrests on the logic that 119862perp-orthogonality in 119872
0implies that
Z1015840Z is diagonal However the converse fails and instead isembodied in R
119862isin 1198720 In consequence conventional VIF
119888s
are deficient in applying only to slopes whereas the VF119888s
resulting from Definition 12(ii) apply informatively for all of[1205730 1205731 1205732 120573
119896]
Definition 13 Designate VF119907(sdot) and VF
119888(sdot) as Variance Fac-
tors resulting from Reference models of Definition 8 andDefinition 12(ii) respectively On occasion these representVariance Deflation in addition to VIFs
Essential invariance properties pertain ConventionalVIF119888s are scale invariant see Section 3 of [9]We see next that
they are translation invariant as well To these ends we shiftthe columns of X=[X
1X2 X
119896] to a new origin through
X rarr X minus L = H where L = [11988811119899 11988821119899 119888
1198961119899] The
resulting model is
Y = (12057301119899+ L120573) +H120573 + 120598
lArrrArr Y = (1205730+ 120595) 1
119899+H120573 + 120598
(15)
thus preserving slopes where120595 = c1015840120573=(11988811205731+11988821205732+sdot sdot sdot+119888
119896120573119896)
Corresponding to X0= [1119899X] is U
0= [1119899H] and to X1015840
0X0
and its inverse are
U10158400U0= [
119899 11015840119899H
H10158401119899
H1015840H] (U10158400U0)minus1
= [11988700
b01
b101584001
G ] (16)
in the form of (6)This pertains to subsequent developmentsand basic invariance results emerge as follows
Lemma 14 Consider Y = 12057301119899+ X120573 + 120598 together with the
shifted version Y = (1205730+ 120595)1
119899+ H120573 + 120576 both in 119872
0 Then
the matrices G appearing in (6) and (16) are identical
Proof Rules for block-partitioned inverses again assert thatG of (16) is the inverse of
(H1015840H minus 119899minus1H10158401
11989911015840119899H) = H1015840 (I
119899minus 119899minus1111989911015840119899)H
= H1015840B119899H = X1015840B
119899X
(17)
since B119899L = 0 to complete our proof
These facts in turn support subsequent claims that cen-tered VIF
119888s are translation and scale invariant for slopes
1015840
=
[1205731 1205732 120573
119896] apart from the intercept 120573
0
43 A Critique Again we distinguish VIF119888s and VIF
119906s from
centered and uncentered regressorsThe following commentsapply
(C1) A design is either 119881perp or 119862perp-orthogonal respectivelyaccording as the lower right (119896times119896) blockX1015840X ofX1015840
0X0
orC = (X1015840Xminus119899XX1015840) from expression (6) is diagonalOrthogonalities of type 119881perp and 119862
perp are exclusive andhence work at crossed purposes
(C2) In particular 119862perp-orthogonality holds if and only ifthe columns of X are 119881
perp-nonorthogonal If X is 119862perp-orthogonal then for 119894 = 119895 C
119908(119894119895) = X1015840
119894X119895minus 119899119883119894119883119895=
0 and X1015840119894X119895
= 0 Conversely if X indeed is 119881perp-
orthogonal so that X1015840X = D= Diag(11988911 119889
119896119896)
then (D minus 119899XX1015840) cannot be diagonal as Reference inwhich case the conventional VIF
119888s are tenuous
(C3) Conventional VIF119906s based on uncentered X in 119872
0
with X = 0 do not gage variance inflation as claimedfounded on the false tenet that X1015840
0X0can be diagonal
(C4) To detect influential observations and to classify highleverage points casendashinfluence diagnostics namely119891119895119868
= VIF119895(minus119868)
VIF119895 1 le 119895 le 119896 are studied in [27]
for assessing the impact of subsets on variance infla-tion Here VIF
119895is from the full data and VIF
119895(minus119868)on
deleting observations in the index set 119868 Similarly [28]proposes using VIF
(119894)on deleting the 119894th observation
In the present context these would gain substance onmodifying VF
119907and VF
119888accordingly
5 Case Study 1 Continued
51 The Setting We continue an elementary and transparentexample to illustrate essentials Recall119872
0 119884119894= 1205730+12057311198831+
12057321198832+120598119894 of Section 22 the designX
0= [15X1X2] of order
(5 times 3) and U = X10158400X0 and its inverse V = (X10158400X0)minus1 as in
expressions (1) The design is neither 119881perp nor 119862perp orthogonalsince neither the lower right (2 times 2) block of X1015840
0X0nor the
centered Z1015840Z is diagonal Moreover the uncentered VIF119906s as
listed in (3) are not the vaunted relative increases in variancesowing to nonorthogonal columns of X
0 Indeed the only
opportunity for 119881perp-orthogonality here is between columns
[X1X2]
Nonetheless from Section 41 we utilize Theorem 4(i)and (10) to connect the collinearity indices of [9] to anglesbetween subspaces Specifically Minitab recovered valuesfor the residuals RS
119895= x
119895minus P119895x1198952 119895 = 0 1 2 fur-
ther computations proceed routinely as listed in Table 1 Inparticular the principal angle 120579
0between the constant vector
and the span of the regressor vectors is 1205790= 31751 deg as
anticipated in Remark 7
52 119881perp-Orthogonality For subsequent reference let 119863(119909) bea design as in (1) but with second-moment matrix
U (119909) = [
[
5 3 1
3 25 119909
1 119909 3
]
]
(18)
8 Advances in Decision Sciences
Table 1 Values for residuals RS119895= x119895minus P119895x1198952 for X
1198952 for 1205812
119906119895= VI F
119906(120573119895) for (1 minus 1120581
2
119906119895)12 and for 120579
119895 for 119895 = 0 1 2
119895 RS119895
X1198952
1205812
119895(1 minus 1120581
2
119906119895)12
120579119895(deg)
0 13846 50 36111 08503 317511 06429 25 38889 08619 304702 25714 30 11667 03780 67792
Invoking Definition 8 we seek a 119881perp-orthogonal Reference
with moment matrix U(0) having X10158401X2
= 0 that is 119863(0)This is found constructively on retaining [1
5 ∙X2] in X
0
but replacing X1by X01
= [1 05 05 1 0]1015840 as listed in
Table 2 giving the design 119863(0) with columns [15X01X2] as
ReferenceAccordingly R
119881= U(0) is ldquoas 119881
perp-orthogonal as it cangetrdquo given the lengths and sums of (X
1X2) as prescribed
in the experiment and as preserved in the Reference modelLemma 9 with X=[06 02]
1015840 and D= Diag(25 3) givesX1015840Dminus1X = 0157333 and 120574 = 5(1 minus 5X1015840Dminus1X) =
234375 Applications of Lemma 9(iii)-(iv) in succession givethe reference variances
Var (1205730| X1015840X = D) = [5 (1 minus 5X1015840Dminus1X)]
minus1
= 09375
Var (1205731|X1015840X=D)=
1
11988911
[1+1205741198832
1
11988911
]=2
5[1+
072120574
5]=17500
Var (1205732|X1015840X=D)=
1
11988922
[1+1205741198832
2
11988922
]=1
3[1+
004120574
3]=04375
(19)
As Reference these combine with actual variances from119881 = [119881
119894119895] at (1) giving VF
119907for the original designX
0relative
to R119881 For example VF
119907(1205730) = 0722209375 = 07704 with
11988111
= 07222 from (1) and 09375 from (19) This contrastswith VIF
119906(1205730) = 36111 in (3) as in Section 22 and Table 2
it is noteworthy that the nonorthogonal design 119863(1) yieldsuniformly smaller variances than the 119881
perp-orthogonal R119881
namely119863(0)Further versions 119863(119909) 119909 isin [minus05 00 05 06 10 15]
of our basic design are listed for comparison in Table 2 where119909 = X1015840
1X2for the various designs The designs themselves are
exhibited on transposing rowsX10158401= [11988311 11988312 11988313 11988314 11988315]
from Table 2 and substituting each into the template X00
=
[15 ∙X2] The design 119863(2) was constructed but not listed
since its X10158400X0is not invertible Clearly these are actual
designs amenable to experimental deployment
53 119862perp-Orthogonality Continuing X0
= [15X1X2] and
invoking Definition 12(ii) we seek a 119862perp-orthogonal
Reference having the matrix C119908
as diagonal This is
identified as 119863(06) in Table 2 From this the matrix R119862and
its inverse are
R119862= [
[
5 3 1
3 25 06
1 06 3
]
]
Rminus1119862
= [
[
07286 minus08572 minus00714
minus08571 14286 00
minus00714 00 03571
]
]
(20)
The variance factors are listed in (4) where for exampleVF119888(1205732) = 0388903571 = 10889 As distinct from conven-
tional VIF119888s for 120573
1and 120573
2only our VF
119888(1205730) here reflects
Variance Deflationwherein 1205730is estimated more precisely in
the initial 119862perp-nonorthogonal designA further note on orthogonality is germane Suppose
that the actual experiment is 119863(0) yet towards a thoroughdiagnosis the user evaluates the conventional VIF
119888(1205731) =
12250 = VIF119888(1205732) as ratios of variances of 119863(0) to 119863(06)
from Table 2 Unfortunately their meaning is obscured sincea design cannot at once be 119881perp and 119862
perp-orthogonalAs in Remark 2 we next alter the basic designX
0at (1) on
shifting columns of themeasurements to [0 0] asminima andscaling to have squared lengths equal to 119899 = 5 The resultingX0follows directly from (1) The new matrix U = X
1015840
0X0 andits inverse V are
U = [
[
5 42426 42426
42426 5 4
42426 4 5
]
]
V = [
[
1 minus04714 minus04714
minus04714 07778 minus02222
minus04714 minus02222 07778
]
]
(21)
giving conventional VIF119906s as [5 38889 38889] Against 119862perp-
orthogonality this gives the diagnostics
[VF119888(1205730) VF119888(1205731) VF119888(1205732)] = [08140 10889 10889]
(22)
demonstrating in comparison with (4) that VF119888s apart from
1205730 are invariant under translation and scaling a consequence
of Lemma 14We further seek to compare variances for the shifted
and scaled (X1X2) against 119881perp-orthogonality as Reference
However from U = X10158400X0at (21) we determine that 119883
1=
084853 = 1198832and 5[(119883
2
15) + (119883
2
25)] = 14400 Since
the latter exceeds unity we infer from Lemma 9(i) that
Advances in Decision Sciences 9
Table 2 Designs 119863(119909) 119909 isin [minus05 00 05 06 10 15] obtained by substitutingX01= [11988311 11988312 11988313 11988314 11988315]1015840 in [15 ∙X2] of order (5times3)
and corresponding variances Var (1205730)Var (120573
1)Var (120573
2)
Design 11988311
11988312
11988313
11988314
11988315
Var (1205730) Var (120573
1) Var (120573
2)
119863(minus05) 10 00 05 10 05 19333 37333 09333119863(00) 10 05 05 10 00 09375 17500 04375119863(05) 05 10 00 10 05 07436 14359 03590119863(06) 06091 05793 06298 11819 0000 07286 14286 03570119863(10) 00 05 05 10 10 07222 15556 03889119863(15) 00 10 05 05 10 09130 24348 06087
119881perp-orthogonality is incompatible with this configuration
of the data so that the VF119907s are undefined Equivalently
on evaluating 1206011
= 1206012
= 2290 deg and 1206011+ 1206012
=
4580 lt 90 deg Corollary 10 asserts that R119881is not positive
definite This appears to be anomalous until we recallthat 119881perp-orthogonality is invariant under rescaling but notrecentering the regressors
54 A Critique We reiterate apparent ambiguity ascribedin the literature to orthogonality A number of widely heldguiding precepts has evolved as enumerated in part in ouropening paragraph and in Section 3 As in Remark 3 theseclearly have been taken to apply verbatim forX
0andX1015840
0X0in
1198720 to be paraphrased in part as follows(P1) Ill-conditioning espouses inflated variances that is
VIFs necessarily equal or exceed unity(P2) 119862perp-orthogonal designs are ldquoidealrdquo in that VIF
119888s for
such designs are all unity see [11] for exampleWe next reassess these precepts as they play out under119881perp and 119862
perp orthogonality in connection with Table 2(C1) For models in 119872
0having X = 0 we reiterate that
the uncentered VIF119906s at expression (3) namely
[36111 38889 11667] overstate adverse effects onvariances of the 119881perp-nonorthogonal array since X1015840
0X0
cannot be diagonal Moreover these conventionalvalues fail to discern that the revised values for VF
119907s
at (5) namely [07704 08889 08889] reflect that[1205730 1205731 1205732] are estimated with greater efficiency in the
119881perp-nonorthogonal array119863(1)
(C2) For 119881perp-orthogonality the claim P1 thus is false by
counterexampleTheVFs at (5) areVarianceDeflationFactors for design 119863(1) where X1015840
1X2= 1 relative to
the 119881perp-orthogonal119863(0)(C3) Additionally the variances Var[120573
0(119909)] Var[120573
1(119909)]
Var[1205732(119909)] in Table 2 are all seen to decrease from
119863(0) [09375 17500 04375] rarr 119863(1) [0722215556 03889] despite the transition from the 119881
perp-orthogonal 119863(0) to the 119881
perp-nonorthogonal 119863(1)Similar trends are confirmed in other ill-conditioneddata sets from the literature
(C4) For 119862perp-orthogonal designs the claim P2 is false by
counterexample Such designs need not be ldquoidealrdquoas 1205730may be estimated more efficiently in a 119862
perp-nonorthogonal design as demonstrated by the VF
119888s
for 1205730at (4) and (22) Similar trends are confirmed
elsewhere as seen subsequently
(C5) VFs for 1205730are critical in prediction where prediction
variances necessarily depend on Var(1205730) especially
for predicting near the origin of the system of coor-dinates
(C6) Dissonance between 119881perp and 119862
perp is seen in Table 2where 119863(06) as 119862
perp-orthogonal with X10158401X2
= 06is the antithesis of 119881perp-orthogonality at 119863(0) whereX10158401X2= 0
(C7) In short these transparent examples serve to dispelthe decades-old mantra that ill-conditioning neces-sarily spawns inflated variances formodels in119872
0 and
they serve to illuminate the contributing structures
6 Orthogonal and Linked Arrays
A genuine 119881perp-orthogonal array was generated as eigenvec-
tors E = [E1E2 E
8] from a positive definite (8 times 8)
matrix (see Table 83 of [11 p377]) using PROC IML of theSAS programming package The columns [X
1X2X3] as the
second through fourth columns of Table 3 comprise the firstthree eigenvectors scaled to length 8 These apply for modelsin 119872119908and 119872
0to be analyzed In addition linked vectors
(Z1Z2Z3) were constructed as Z
1= 119888(09E
1+ 01E
2)
Z2
= 119888(09E2+ 01E
3) and Z
3= 119888(09E
3+ 01E
4) where
119888 = radic8082 Clearly these arrays as listed in Table 3 are notarchaic abstractions as both are amenable to experimentalimplementation
61 The Model 1198720 We consider in turn the orthogonal and
the linked series
611 Orthogonal Data Matrices X10158400X0and (X1015840
0X0)minus1 for the
orthogonal data under model1198720are listed in Table 4 where
variances occupy diagonals of (X10158400X0)minus1 The conventional
uncentered VIF119906s are [120296 102144 112256 105888]
Since X = 0 we find the angle between the constant andthe span of the regressor vectors to be 120579
0= 65748 deg as
in Theorem 4(ii) Moreover the angle between X1and the
span of [1119899X2X3] namely 120579
1= 81670 deg is not 90 deg
because of collinearity with the constant vector despite themutual orthogonality of [X
1X2X3] Observe here thatX1015840
0X0
10 Advances in Decision Sciences
Table 3 Basic 119881perp-orthogonal design X0 = [18 X1 X2 X3] and a linked design Z0 = [18 Z1 Z2 Z3] of order (8 times 4)
Orthogonal (X1X2X3) Linked (Z
1Z2Z3)
18
X1
X2
X3
18
Z1
Z2
Z3
1 minus1084470 0056899 0541778 1 minus1071550 0116381 06534861 minus1090010 minus0040490 0117026 1 minus1087820 minus0027320 01160861 0979050 0002564 2337454 1 0973345 0260677 21996871 minus1090230 0016839 0170711 1 minus1081700 0035588 00706941 0127352 2821941 minus0071670 1 0438204 2796767 minus00707101 1045409 minus0122670 minus1476870 1 1025468 minus0285010 minus16185601 1090803 minus0088970 0043354 1 1074306 minus0083640 01769281 1090739 minus0092300 0108637 1 1073875 minus0079740 0244567
Table 4 Matrices X10158400X0 and (X10158400X0)minus1 for the orthogonal data under model119872
0
X10158400X0
(X10158400X0)minus1
8 10686 25538 17704 01504 minus00201 minus00480 minus0033310686 8 0 0 minus00201 01277 00064 0004425538 0 8 0 minus00480 00064 01403 0010617704 0 0 8 minus00333 00044 00106 01324
already is 119881perp-orthogonal accordingly R119881= X10158400X0 and thus
the VF119907s are all unity
In view of dissonance between 119881perp and 119862
perp orthogonalitythe sums of squares and products for the mean-centered Xare
X1015840B119899X = [
[
78572 minus03411 minus02365
minus03411 71847 minus05652
minus02365 minus05652 76082
]
]
(23)
reflecting nonnegligible negative dependencies to distin-guish the 119881
perp-orthogonal X from the 119862perp-nonorthogonal
matrix Z = B119899X of deviations
To continue we suppose instead that the 119881perp-orthogonalmodel was to be recast as 119862
perp-orthogonal The Referencemodel and its inverse are listed in Table 5 Variance factorsas ratios of diagonal elements of (X1015840
0X0)minus1 in Table 4 to those
of Rminus1119862
in Table 5 are
[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]
= [10168 10032 10082 10070]
(24)
reflecting negligible differences in precision This parallelsresults reported in Table 2 comparing 119863(0) to 119863(06) asReference except for larger differences in Table 2 namely[128681225012250]
612 Linked Data For the Linked Data under model 1198720
the matrix Z10158400Z0 follows routinely from Table 3 its inverse(Z10158400Z0)minus1 is listed in Table 6 The conventional uncentered
VIF119906s are now [120320 103408 113760 105480] These
again fail to gage variance inflation since Z10158400Z0cannot be
diagonal From (10) the principal angle between the constantvector and the span of the regressor vectors is 120579
0= 65735
degrees
Remark 15 Observe that the choice for R119862may be posed as
seeking R119862
= U(119909) such that Rminus1119862(2 3) = 00 then solving
numerically for 119909 using Maple for example However thealgorithm in Definition 12(ii) affords a direct solution thatC119908should be diagonal stipulates its off-diagonal element as
X10158401X2minus 35 = 0 so that X1015840
1X2= 06 in R
119862at expression (20)
To illustrate Theorem 4(iii) we compute cos(120579) = (1 minus
110889)12 = 02857 and 120579 = 73398 deg as the angle between
the vectors [X1X2] of (1) when centered to their means For
slopes in 1198720 [9] shows that VIF
119888(120573119895) lt VIF
119906(120573119895) 1 le
119894 le 119896 This follows since their numerators are equal butdenominators are reciprocals of lengths of the centered anduncentered regressor vectors To illustrate numerators areequal for VIF
119906(1205733) = 03889(13) = 11667 and for
VIF119888(1205733) = 03889(128) = 10889 but denominators are
reciprocals of X10158402X2= 3 and sum
5
119895=1(1198832119895
minus 1198832)2= 28
(119894) 119881perp Reference Model To rectify this malapropism we seek
in Definition 8 a model R119881as Reference This is found on
setting all off-diagonal elements of Z10158400Z0to zero excluding
the first row and column Its inverse Rminus1119881
is listed in Table 6The VF
119907s are ratios of diagonal elements of (Z1015840
0Z0)minus1 on the
left to diagonal elements of Rminus1119881
on the right namely
[VF119907(1205730) VF119907(1205731) VF119907(1205732) VF119907(1205733)]
= [09697 09991 09936 09943]
(25)
Against conventional wisdom it is counter-intuitive that[1205730 1205731 1205732 1205733] are all estimated with greater precision in the
119881perp-nonorthogonal model Z1015840
0Z0than in the 119881
perp-orthogonalmodel R
119881of Definition 8 As in Section 54 this serves again
to refute the tenet that ill-conditioning espouses inflated
Advances in Decision Sciences 11
Table 5 Matrices R119862and Rminus1
119862for checking 119862
perp-orthogonality in the orthogonal data X0 under model1198720
R119862
Rminus1119862
8 10686 25538 17704 01479 minus00170 minus00444 minus0029110686 8 03411 02365 minus00170 01273 0 025538 03411 8 05652 minus00444 0 01392 017704 02365 05652 8 minus00291 0 0 01314
Table 6 Matrices (Z10158400Z0)minus1 and Rminus1
119881for checking 119881
perp-orthogonality for the linked data under model1198720
(Z10158400Z0)minus1 Rminus1
119881
01504 minus00202 minus00461 minus00283 01551 minus00261 minus00530 minus00344minus00202 01293 minus000797 00053 minus00261 01294 00089 00058minus00461 minus00079 01422 minus00054 minus00530 00089 01431 00117minus00283 00053 minus00054 01319 minus00344 00058 00117 01326
variances for models in1198720 that is that VFs necessarily equal
or exceed unity(119894119894) 119862
perp Reference Model Further consider variances werethe Linked Data to be centered As in Definition 12(ii)we seek a Reference model R
119862such that C
119908= (W minus
M) is diagonal The required R119862
and its inverse arelisted in Table 7 Since variances in the Linked Data are[015040 012926 014220 013185] from Table 6 and Refer-ence values appear as diagonal elements of Rminus1
119862on the right
in Table 7 their ratios give the centered VF119888s as
[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]
= [09920 10049 10047 10030]
(26)
We infer that changes in precision would be negligible ifinstead the Linked Data experiment was to be recast as a 119862perp-orthogonal experiment
As in Remark 15 the reference matrix R119862is constrained
by the first and second moments from X10158400X0and has free
parameters 1199091 1199092 1199093 for the (2 3) (2 4) (3 4) entries
The constraints for 119862perp-orthogonality are Rminus1
119862(2 3) =
Rminus1119862(2 4) = Rminus1
119862(3 4) = 0 Although this system of
equations can be solved with numerical software such asMaple the algorithm stated in Definition 12 easily yields thedirect solution given here
62 The Model 119872119908 For the Linked Data take 119884
119894= 12057311198851198941+
12057321198851198942
+ 12057331198851198943
+ 120598119894 1 le 119894 le 8 as Y = Z120573 + 120598
where the expected response at 1198851
= 1198852
= 1198853
= 0
is 119864(119884) = 00 and there is no occasion to translate theregressorsThe lower right (3 times 3) submatrix ofZ1015840
0Z0(4 times 4)
from 1198720is Z1015840Z its inverse is listed in Table 8 The ratios
of diagonal elements of (Z1015840Z)minus1 to reciprocals of diagonalelements of Z1015840Z are the conventional uncentered VIF
119906s
namely [10123 10247 10123] Equivalently scaling Z1015840Z tohave unit diagonals and inverting gives Cminus1 as in Table 8 itsdiagonal elements are the VIF
119906sThroughout Section 6 these
comprise the only correct interpretation of conventionalVIF119906s as genuine variance ratios
7 Case Study Body Fat Data
71 The Setting Body fat and morphogenic measurementswere reported for 119899 = 20 human subjects in Table D11 of[29 p717] Response and regressor variables are119884 amount ofbody fat X
1 triceps skinfold thickness X
2 thigh circumfer-
ence and X3 mid-arm circumference under119872
0 119884119894= 1205730+
12057311198831+12057321198832+12057331198833+ 120598119894 From the original data as reported
the condition number of X10158400X0is 1039931 and the VIF
119906s are
[2714800 4469419 630998 778703] On scaling columnsof X to equal column lengths X
119894 = 50 119894 = 1 2 3 the
resulting condition number of the revisedX10158400X0is 2 9696518
and the uncenteredVIF119906s are as before from invariance under
scalingAgainst complaints that regressors are remote from their
natural origins we center regressors to [0 0 0] as origin onsubtracting the minimal element from each column as inRemark 2 and scaling these to X
119894 = 50 119894 = 1 2 3
This is motivated on grounds that centered diagnostics apartfrom VIF
119888(1205730) are invariant under shifts and scalings from
Lemma 14 In addition under the new origin [0 0 0] 1205730
assumes prominence as the baseline response against whichchanges due to regressors are to be gaged The original dataare available on the publisherrsquos web page for [29]
Taking these shifted and scaled vectors as columnsof X = [X
1X2X3] in the revised X
0= [1
119899X]
values for X10158400X0
and its inverse are listed in Table 9The condition number of X1015840
0X0
is now 1136969 andits VIF
119906s are [6775617998742782174484] Additionally
from Theorem 4(ii) we recover the angle 1205790= 22592 deg to
quantify collinearity of regressors with the constant vectorIn addition the scaled and centered correlationmatrix for
X is
R = [
[
10 00847 08781
00847 10 01424
08781 01424 10
]
]
(27)
indicating strong linkage between mean-centered versions of(X1X3)Moreover the experimental design is neither119881perp nor
119862perp-orthogonal since neither the lower right (3 times 3) block of
X10158400X0nor the centered matrixC = (X1015840Xminus119899XX1015840) is diagonal
12 Advances in Decision Sciences
Table 7 Matrices R119862and Rminus1
119862for checking 119862
perp-orthogonality in the linked data under model1198720
R119862
Rminus1119862
8 13441 27337 17722 01516 minus00216 minus00484 minus0029113441 8 04593 02977 minus00216 01286 0 027337 04593 8 06056 minus00484 0 01415 017722 02977 06056 8 minus00291 0 0 01314
Table 8 Matrices (Z1015840Z)minus1 and Cminus1 for the linked data under model119872119908
(Z1015840Z)minus1 Cminus1
01265 minus00141 00015 10123 minus01125 00123minus00141 01281 minus00141 minus01125 10247 minus0112500015 minus00141 01265 00123 minus01125 10123
(119894) 119881perp Reference Model To compare this with a 119881
perp-or-thogonal design we turn again to Definition 8 Howeverevaluating the test criterion in Lemma 9(119894) shows that thisconfiguration is not commensurate with 119881
perp-orthogonalityso that the VF
119907s are undefined Comparisons with 119862
perp-orthogonal models are addressed next(119894119894) 119862perp Reference Model As in Definition 12(ii) the centering
matrix is
119899XX1015840 = [
[
188889 189402 187499
189402 189916 188007
187499 188007 186118
]
]
(28)
To check against 119862perp-orthogonality we obtain the matrix Wof Definition 12(ii) on replacing off-diagonal elements of thelower right (3 times 3) submatrix ofX1015840
0X0by corresponding off-
diagonal elements of 119899XX1015840 The result is the matrix R119862and
its inverse as given in Table 10 where diagonal elements ofRminus1119862
are the Reference variances The VF119888s found as ratios
of diagonal elements of (X10158400X0)minus1 relative to those of Rminus1
119862
are listed in Table 12 under G = [0 0 0] the indicator ofDefinition 12(i) for this case
72 Variance Factors and Linkage Traditional gages of ill-conditioning are patently absurd on occasion By conventionVIFs are ldquoall or nonerdquo wherein Reference models entailstrictly diagonal components In practice some regressorsare inextricably linked to get or even visualize orthogonalregressors may go beyond feasible experimental rangesrequire extrapolation beyond practical limits and challengecredulity Response functions so constrained are described in[30] as ldquopicket fencesrdquo In such cases taking C to be diagonalas its 119862perp Reference is moot at best an academic folly abettedin turn by default in standard software packages In shortconventional diagnostics here offer answers to irrelevantquestions Given pairs of regressors irrevocably linked weseek instead to assess effects on variances for other pairsthat could be unlinked by design These comments applyespecially to entries under G = [0 0 0] in Table 12
To proceed we define essential ill-conditioning as regres-sors inextricably linked necessarily to remain so and asnonessential if regressors could be unlinked by design As a
followup to Section 32 for X = 0 we infer that the constantvector is inextricably linked with regressors thus accountingfor essential ill-conditioning not removed by centeringcontrary to claims in [4] Indeed this is the essence ofDefinition 8
Fortunately these limitations may be remedied throughReference models adapted to this purpose This in turnexploits the special notation and trappings of Definition 12(i)as follows In addition toG = [0 0 0] we iterate for indicatorsG = [119892
12 11989213 11989223] taking values in [1 0 0] [0 1 0] [0 0 1]
[1 1 0] [1 0 1] [0 1 1] Values of VF119888s thus obtained are
listed in Table 12 To fix ideas the Reference R119862for G =
[1 0 0] that is for the constraint R119862(2 3) = X1015840
0X0(2 3) =
194533 is given in Table 11 together with its inverse Foundas ratios of diagonal elements of (X1015840
0X0)minus1 relative to those of
Rminus1119862
in Table 11 VF119888s are listed in Table 12 underG = [1 0 0]
In short R119862is ldquoas 119862
perp-orthogonal as it could berdquo given theconstraint Other VF
119888s in Table 12 proceed similarly For all
cases in Table 12 excluding G = [0 1 1] 1205730is estimated
with greater efficiency in the original model than any of thepostulated Reference models
As in Remark 15 the reference matrix R119862
for G =
[1 0 0] is constrained by X10158401X2
= 194533 to be retainedwhereas X1015840
1X3
= 1199091and X1015840
2X3
= 1199092are to be determined
from Rminus1119862(2 4) = Rminus1
119862(3 4) = 0 Although this system
can be solved numerically as noted the algorithm stated inDefinition 12 easily yields the direct solution given here
Recall 1198831 triceps skinfold thickness 119883
2 thigh cir-
cumference and 1198833 mid-arm circumference and from
(27) that (1198831 1198833) are strongly linked Invoking the cutoff
rule [24] for VIFs in excess of 40 it is seen for G =
[0 0 0] in Table 12 that VF119888(1205731) and VF
119888(1205733) are excessive
where [X1X2X3] are forced to be mutually uncorrelated
as Reference Remarkably similar values are reported forG isin [1 0 0] [ 0 0 1] [1 0 1] all gaged against intractableReference models wherein (119883
1 1198833) are postulated to be
uncorrelated however incrediblyOn the other hand VF
119888s for G isin [0 1 0] [1 1 0]
[0 1 1] in Table 12 are likewise comparable where (X1X3)
are allowed to remain linked at X10158401X3= 242362 Here the
VF119888s reflect negligible changes in efficiency of the estimates
Advances in Decision Sciences 13
Table 9 Matrices X10158400X0 and (X10158400X0)minus1 for the body fat data under model119872
0centered to minima and scaled to equal lengths
X10158400X0
(X10158400X0)minus1
20 194365 194893 192934 03388 minus01284 minus01482 minus00203194365 25 194533 242362 minus01284 07199 00299 minus06224194893 194533 25 196832 minus01482 00299 01711 minus00494192934 242362 196832 25 minus00203 minus06224 minus00494 06979
Table 10 Matrices R119862and Rminus1
119862for the body fat data adjusted to give off-diagonal elements the value 00 with code G = [0 0 0]
R119862
Rminus1119862
20 194365 194893 192934 05083 minus01590 minus01622 minus01510194365 25 189402 187499 minus01590 01636 0 0194893 189402 25 188007 minus01622 0 01664 0192934 187499 188007 25 minus01510 0 0 01565
in comparison with the originalX10158400X0 where [X
1X2X3] are
all linked In summary negligible changes in overall efficiencywould accrue on recasting the experiment so that (X
1X2)
and (X2X3) were pairwise uncorrelated
Table 12 conveys further useful information on notingthat VFs as ratios are multiplicative and divisible Forexample take the model codedG = [1 1 0] whereX1015840
1X2and
X10158401X3are retained at their initial values under X1015840
0X0 namely
194533 and 242362 from Table 9 with (X2X3) unlinked
Now taking G = [0 1 0] as reference we extract the VFson dividing elements in Table 12 at G = [0 1 0] by those ofG = [1 1 0] The result is
[VF (1205730) VF (120573
1) VF (120573
2) VF (120573
3)]
= [09196
0984810073
1005710282
1020810208
10208]
= [09338 10017 10072 10000]
(29)
8 Conclusions
Our goal is clarity in the actual workings of VIFs as ill-conditioning diagnostics thus to identify and rectifymiscon-ceptions regarding the use of VIFs for models X
0= [1119899X]
with intercept Would-be arbiters for ldquocorrectrdquo usage havedivided between VIF
119888s for mean-centered regressors and
VIF119906s for uncentered data We take issue with both Conven-
tional but clouded vision holds that (i) VIF119906s gage variance
inflation owing to nonorthogonal columns of X0as vectors
in R119899 (ii) VIFs ge 10 and (iii) models having orthogonalmean-centered regressors are ldquoidealrdquo in possessing VIF
119888s of
value unity Reprising Remark 3 these properties widely andcorrectly understood for models in119872
119908 appear to have been
expropriated without verification for models in1198720hence the
anomaliesAccordingly we have distinguished vector-space orthog-
onality (119881perp) of columns of X0 in contrast to orthogonality
(119862perp) of mean-centered columns of X A key feature is theconstruction of second-moment arrays as Reference models(R119881R119862) encoding orthogonalities of these types and their
Variance Factors as VF119907rsquos and VF
119888rsquos respectively Contrary
to convention we demonstrate analytically and through casestudies that (i1015840) VIF
119906s do not gage variance inflation (ii1015840)
VFs ge 10 need not hold whereas VF lt 10 representsVariance Deflation and (iii1015840) models having orthogonalcentered regressors are not necessarily ldquoidealrdquo In particularvariance deflation occurs when ill-conditioned data yieldsmaller variances than corresponding orthogonal surrogates
In short our studies call out to adjudicate the prolongeddispute regarding centered and uncentered diagnostics formodels in119872
0 Our summary conclusions follow
(S1) The choice of a model with or without intercept isa substantive matter in the context of each exper-imental paradigm That choice is fixed beforehandin a particular setting is structural and is beyondthe scope of the present study Snee and Marquardt[10] correctly assert that VIFs are fully creditedand remain undisputed in models without interceptthese quantities unambiguously retain the meaningsoriginally intended namely as ratios of variances togage effects of nonorthogonal regressors For modelswith intercept the choice is between centered anduncentered diagnostics which is to be weighed asfollows
(S2) ConventionalVIF119888s apply incompletely to slopes only
not necessarily grasping fully that a system is illconditioned Of particular concern is missing evi-dence on collinearity of regressors with the interceptOn the other hand if 119862
perp-orthogonality pertainsthen VF
119888s may supplant VIF
119888s as applicable to all of
[1205730 1205731 120573
119896] Moreover the latter may reveal that
VF119888(1205730) lt 10 as in cited examples and of special
import in predicting near the origin of the coordinatesystem
(S3) Our VF119907s capture correctly the concept apparently
intended by conventional VIF119906s Specifically if 119881perp-
orthogonality pertains then VF119907s as genuine ratios
of variances may supplant VIF119906s as now discredited
gages for variance inflation(S4) Returning to Section 32 we subscribe fully but for
different reasons to Belsleyrsquos and othersrsquo contention
14 Advances in Decision Sciences
Table 11 Matrices R119862and Rminus1
119862for the body fat data adjusted to give off-diagonal elements the value 00 except X10158401X2 code G = [1 0 0]
R119862
Rminus1119862
20 194365 194893 192934 04839 minus01465 minus01497 minus01510194365 25 194533 187499 minus01465 01648 minus00141 0194893 194533 25 188007 minus01497 minus00141 01676 0192934 187499 188007 25 minus01510 0 0 01565
Table 12 Summary VF119888s for the body fat data centered to minima and scaled to common lengths identified with varying G = [119892
12 11989213 11989223]
from Definition 12
Elements of G VF119888s
11989212
11989213
11989223
1205730
1205731
1205732
1205733
0 0 0 06665 43996 10282 44586
1 0 0 07002 43681 10208 44586
0 1 0 09196 10073 10282 10208
0 0 1 07201 43996 10073 43681
1 1 0 09848 10057 10208 10208
1 0 1 07595 43681 10003 43681
0 1 1 10248 10073 10073 10160
that uncentered VIF119906s are essential in assessing ill-
conditioned data In the geometry of ill-conditioningthese should be retained in the pantheon of regres-sion diagnostics not as now debunked gages ofvariance inflation but as more compelling angularmeasures of the collinearity of each column of X
0=
[1119899X1X2 X
119896] with the subspace spanned by
remaining columns(S5) Specifically the degree of collinearity of regressors
with the intercept is quantified by 1205790deg which
if small is known to ldquocorrupt the estimates of allparameters in the model whether or not the interceptis itself of interest and whether or not the data havebeen centeredrdquo [21 p90]This in turnwould appear toelevate 120579
0in importance as a critical ill-conditioning
diagnostic(S6) In contrast Theorem 4(iii) identifies conventional
VIF119888s with angles between a centered regressor and
the subspace spanned by remaining centered regres-sors For example
1205791= arccos([
[
1 minus1
VIFc ( ldquo1205731)]
]
12
) (30)
is the angle between the centered vector Z1= B119899X1
and the remaining centered vectors These lie in thelinear subspace of R119899 comprising the orthogonalcomplement to 1
119899isin R119899 and thus bear on the
geometry of ill-conditioning in this subspace(S7) Whereas conventional VIFs are ldquoall or nonerdquo to
be gaged against strictly diagonal components ourReference models enable unlinking what may beunlinked while leaving other pairs of regressorslinked This enables us to assess effects on variances
attributed to allowable partial dependencies amongsome but not other regressors
In conclusion the foregoing summary points to limita-tions in modern software packages Minitab v15 119878119875119878119878 v19and 119878119860119878 v92 are standard software packages which computecollinearity diagnostics returning with the default optionsthe centered VIFs VIF
119888(120573119895) 1 le 119895 le 119896 To compute
the uncentered VIF119906s one needs to define a variable for
the constant term and perform a linear regression using theoptions of fitting without an intercept On the other handcomputations for VF
119888s and VF
119907s as introduced here require
additional if straightforward programming for exampleMaple and PROC IML of the SAS System as utilized here
References
[1] D A Belsley ldquoDemeaning conditioning diagnostics throughcenteringrdquo The American Statistician vol 38 no 2 pp 73ndash771984
[2] E R Mansfield and B P Helms ldquoDetecting multicollinearityrdquoThe American Statistician vol 36 no 3 pp 158ndash160 1982
[3] S D Simon and J P Lesage ldquoThe impact of collinearityinvolving the intercept term on the numerical accuracy ofregressionrdquo Computer Science in Economics and Managementvol 1 no 2 pp 137ndash152 1988
[4] D W Marquardt and R D Snee ldquoRidge regression in practicerdquoThe American Statistician vol 29 no 1 pp 3ndash20 1975
[5] K N Berk ldquoTolerance and condition in regression computa-tionsrdquo Journal of the American Statistical Association vol 72 no360 part 1 pp 863ndash866 1977
[6] A E Beaton D Rubin and J Barone ldquoThe acceptability ofregression solutions another look at computational accuracyrdquoJournal of the American Statistical Association vol 71 no 353pp 158ndash168 1976
Advances in Decision Sciences 15
[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975
[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970
[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987
[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984
[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990
[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980
[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969
[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996
[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003
[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006
[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008
[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010
[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981
[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983
[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987
[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986
[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980
[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007
[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984
[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969
[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997
[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000
[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003
[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983
4 Advances in Decision Sciences
whether or not the intercept is itself of interest andwhether or not the data have been centeredrdquo [21 p90]
(v) An example [22 p121] gives severely ill-conditioneddata perfectly conditioned in centered form ldquocenter-ing alters neitherrdquo inflated variances nor extremelysensitive parameter estimates in the basic data more-over ldquodiagnosing the conditioning of the centereddata (which are perfectly conditioned) would com-pletely overlook this situation whereas diagnosticsbased on the basic data would notrdquo
(vi) To continue from [22] ill-conditioning persists inthe propagation of disturbances in that ldquoa 1 percentrelative change in theX
119894rsquos results in over a 40 relative
change in the estimatesrdquo despite perfect conditioningin centered form and ldquoknowledge of the effect ofsmall relative changes in the centered data is notmeaningful for assessing the sensitivity of the basic LSproblemrdquo since relative changes and their meaningsin centered anduncentered data oftendiffermarkedly
(vii) Regarding choice of origin ldquo(1) the investigator mustbe able to pick an origin against which small relativechanges can be appropriately assessed and (2) it is thedata measured relative to this origin that are relevantto diagnosing the conditioning of the LS problemrdquo[22 p126]
Other desiderata pertain
(i) ldquoBecause rewriting the model (in standardized vari-ables) does not affect any of the implicit estimates ithas no effect on the amount of information containedin the datardquo [23 p76]
(ii) Consequences of ill-advised diagnostics can besevere Degraded numerical accuracy traces to nearcollinearity of regressors with the constant vectorIn short centering fails to prevent a loss in numer-ical accuracy centered diagnostics are unable todiscern these potential accuracy problems whereasuncentered diagnostics are seen to work well Twowidely used statistical packages SAS and SPSS-Xfail to detect this type of ill-conditioning throughuse of centered diagnostics and thus return highlyinaccurate coefficient estimates For further detailssee [3]
On balance formodels in1198720the jury is out regarding the
use of centered or uncentered diagnostics to include VIFsEven Belsley [1] (and elsewhere) concedes circumstanceswhere centering does achieve structurally interpretable mod-els Of note is that the foregoing listed citations to Belsleyapply strictly to condition numbers 119888
2(X) other purveyors of
ill-conditioning specifically VIFs are not treated explicitly
34 A Synthesis It bears notice that (i) the origin how-ever remote from the cluster of regressors is essential forprediction and (ii) the prediction variance is invariant toparametrizing in centered or uncentered forms Additionalremarks are codified next for subsequent referral
Remark 2 Typically 119884 represents response to input vari-ables 119883
1 1198832 119883
119896 In a controlled experiment levels
are determined beforehand by subject-matter considerationsextraneous to the experiment to include minimal valuesHowever remote the origin on the basic data scales it seemsinformative in such circumstances to identify the originwith these minima In such cases the intercept is often ofsingular interest since 120573
0is then the standard against which
changes in 119884 are to be gaged as regressors vary We adopt thisconvention in subsequent studies from the archival literature
Remark 3 In summary the divergence in views whetherto center or not appears to be that critical aspects of ill-conditioning known and widely accepted for models in119872
119908
have been expropriated over decades mindlessly and withoutverification to apply point-by-point for models in119872
0
4 The Structure of Orthogonality
This section develops the foundations for Reference modelscapturing orthogonalities of types 119881
perp and 119862perp Essential
collateral results are given in support as well
41 Collinearity Indices Stewart [9] reexamines ill-condi-tioning from the perspective of numerical analysis Detailsfollow where X is a generic matrix of regressors havingcolumns x
119895 1 le 119895 le 119901 and Xdagger = (X1015840X)
minus1X1015840 is thegeneralized inverse of note having xdagger
119895 1 le 119895 le 119901 as
its typical rows Each corresponding collinearity index 120581 isdefined in [9 p72] as
120581119895=10038171003817100381710038171003817x119895
10038171003817100381710038171003817sdot10038171003817100381710038171003817xdagger119895
10038171003817100381710038171003817 1 le 119895 le 119901 (7)
constructed so as to be scale invariant Observe that xdagger1198952 is
found along the principal diagonal of [(Xdagger)(Xdagger)1015840] = (X1015840X)minus1
WhenX(119899times119896) inX0= [1119899X] is centered and scaled Section
3 of [9] shows that the centered collinearity indices 120581119888satisfy
1205812
119888119895= VIF
119888(120573119895) 1 le 119895 le 119896 In 119872
0with parameters (120573
0
120573) values corresponding to xdagger1198952 from Xdagger
0= (X10158400X0)minus1X10158400
lie along the principal diagonal of (X10158400X0)minus1 the uncentered
collinearity indices 120581119906now satisfy 120581
2
119906119895= x1198952sdot xdagger1198952
119895 =
0 1 119896 In particular since x0
= 1119899 we have 120581
2
1199060=
119899xdagger02 Moreover in119872
0it follows that the uncentered VIF
119906s
are squares of the collinearity indices that is VIF119906(120573119895) =
1205812
119906119895 119895 = 0 1 119896 Note the asymmetry that VIF
119888s exclude
the intercept in contrast to the inclusive VIF119906s That the
label Variance Inflation Factors for the latter is a misnomer iscovered in Remark 1 Nonetheless we continue the familiarnotation VIF
119906(120573119895) = 1205812
119906119895 119895 = 0 1 119896
Transcending the essential developments of [9] are con-nections between collinearity indices and angles between
Advances in Decision Sciences 5
subspaces To these ends choose a typical x119895in X0 and
rearrange X0as [x119895X[119895]] We next seek elements of
Q1015840119895(X10158400X0)minus1
Q119895= [
x1015840119895x119895
x1015840119895X[119895]
X1015840[119895]x119895X1015840[119895]X[119895]
]
minus1
(8)
as reordered by the permutation matrix Q119895 From the clock-
wise rule the (1 1) element of the inverse is
[x1015840119895(I119899minus P119895) x119895]minus1
= [x1015840119895x119895minus x1015840119895P119895x119895]minus1
=10038171003817100381710038171003817xdagger119895
10038171003817100381710038171003817
2 (9)
in succession for each 119895 = 0 1 119896 where P119895
=
X[119895][X1015840[119895]X[119895]]minus1X1015840[119895]
is the projection operator onto the sub-space S
119901(X[119895]) sub R119899 spanned by the columns of X
[119895] These
in turn enable us to connect 1205812119906119895 119895 = 0 1 119896 and similarly
for centered values 1205812119888119895 119895 = 1 119896 to the geometry of ill-
conditioning as follows
Theorem 4 For models in 1198720let VIF
119906(120573119895) = 120581
2
119906119895 119895 = 0
1 119896 be conventional uncentered VIF119906s in terms of Stew-
artrsquos [9] uncentered collinearity indices These in turn quantifythe extent of collinearities between subspaces through angles (in119889119890119892 as follows
(i) Angles between (x119895S119901(X[119895])) are given by 120579
119895=
arccos[(1minus11205812uj)12
] in succession for 119895 = 0 1 119896
(ii) In particular 1205790
= arccos[(1 minus 11205812u0)12
] quantifiesthe degree of collinearity between the regressors and theconstant vector
(iii) Similarly let B119899X = Z = [z
1 z
119896] be regressors
centered to their means rearrange as [z119895Z[119895]] and let
VIF119888(120573119895) = 120581
2
119888119895 119895 = 1 119896 be centered VIF
119888s in
terms of Stewartrsquos centered collinearity indices Thenangles (in deg) between (z
119895S119901(Z[119895])) are given by
120579119895= arccos[(1 minus 11205812cj)
12] 1 le j le k
Proof From the geometry of the right triangle formed by(x119895P119895x119895) the squared lengths satisfy x
1198952
= P119895x1198952+
x119895minus P119895x1198952 where x
119895minus P119895x1198952
= RS119895is the residual sum
of squares from the projection It follows that the principalangle between (x
119895P119895x119895) is given by
cos (120579119895)=
x1015840119895P119895x119895
10038171003817100381710038171003817x119895
10038171003817100381710038171003817sdot10038171003817100381710038171003817P119895x119895
10038171003817100381710038171003817
=
10038171003817100381710038171003817P119895x119895
1003817100381710038171003817100381710038171003817100381710038171003817x119895
10038171003817100381710038171003817
=radic1 minusRS119895
10038171003817100381710038171003817x119895
10038171003817100381710038171003817
2=radic1 minus
1
1205812119906119895
(10)
for 119895 = 0 1 119896 to give conclusion (i) Conclusion(ii) follows on specializing (x
0P0x0) with x
0= 1119899and
P0= X(X1015840X)
minus1X1015840 Conclusion (iii) follows similarly from thegeometry of the right triangle formed by (z
119895P119895z119895) where P
119895
now is the projection operator onto the subspace S119901(Z[119895]) sub
R119899 spanned by the columns of Z[119895] to complete our proof
Remark 5 Rules of thumb in common use for problematicVIFs include those exceeding 10 or even 4 see [11 24] forexample In angular measure these correspond respectivelyto 120579119895lt 18435 deg and 120579
119895lt 300 deg
42 Reference Models We seek as Reference feasible modelsencoding orthogonalities of types 119881
perp and 119862perp The keys
are as follows (i) to retain essentials of the experimentalstructure and (ii) to alter what may be changed to achieveorthogonality For a model in119872
119908with moment matrix X1015840X
our opening paragraph prescribes as reference the modelR119881
= D = Diag(11988911 119889
119901119901) as diagonal elements of
X1015840X for assessing 119881perp-orthogonality Moreover on scaling
columns of X to equal lengths R119881is perfectly conditioned
with 1198881(R119881) = 10 In addition every model in 119872
119908clearly
conforms with its Reference in the sense that R119881is positive
definite as distinct from models in1198720to follow
Consider again models in 1198720as in (6) let C = (X1015840X minus
M) withM = 119899XX1015840 as the mean-centering matrix and againletD = Diag(119889
11 119889
119896119896) comprise the diagonal elements of
X1015840X
(i) 119881perpReference ModelThe uncentered VIF119906s in119872
0 defined
as ratios of diagonal elements of (X10158400X0)minus1 to reciprocals of
diagonal elements of X10158400X0 appear to have seen exclusive
usage apparently in keepingwithRemark 3However the fol-lowing disclaimermust be registered as the formal equivalentof Remark 1
Theorem 6 Statements that conventional VIF119906s quantify var-
iance inflation owing to nondiagonalX10158400X0are false for models
in1198720having X = 0
Proof Since the Reference variances are reciprocals of diag-onal elements of X1015840
0X0 this usage is predicated on the false
postulate that X10158400X0can be diagonal for X = 0 Specifically
[1119899X] are linearly independent that is 11015840
119899X = 0 if and only
if X has been mean centered beforehand
To the contrary Gunst [25] purports to show thatVIF119906(1205730) registers genuine variance inflation namely the
price to be paid in variance for designing an experimenthaving X = 0 as opposed to X = 0 Since variances forintercepts are Var(sdot | X = 0) = 120590
2119899 and Var(sdot | X = 0) =
120590211988800
ge 1205902119899 from (6) their ratio is shown in [25] to be
1205812
1199060= 11989911988800
ge 1 in the parlance of Section 23 We concede thisto be a ratio of variances but to the contrary not a VIF sincethe parameters differ In particular Var(120573
0| X = 0) = 120590
211988800
whereas Var( | X = 0) = 1205902119899 with 120572 = 120573
0+ 12057311198831+
sdot sdot sdot + 120573119896119883119896in centered regressors Nonetheless we still find
the conventional VIF119906(1205730) to be useful for purposes to follow
Remark 7 Section 3 highlights continuing concern in regardto collinearity of regressors with the constant vectorTheorem 4(ii) and expression (10) support the use of 120579
0=
arccos[(1 minus 11205812u0)12
] as an informative gage on the extent of
6 Advances in Decision Sciences
this occurrence Specifically the smaller the angle the greaterthe extent of such collinearity
Instead of conventional VIF119906s given the foregoing dis-
claimer we have the following amended version as Referencefor uncentered diagnostics altering whatmay be changed butleaving X = 0 intact
Definition 8 Given a model in 1198720with second-moment
matrixX10158400X0 the amendedReferencemodel for assessing119881perp-
orthogonality is R119881
= [ 119899 119899X1015840
119899X D] with D = Diag(119889
11 119889
119896119896)
as diagonal elements of X1015840X provided that R119881is positive
definite We identify a model to be 119881perp-orthogonal when
X10158400X0= R119881
As anticipated a prospective R119881
fails to conform toexperimental data if not positive definite These and furtherprospects are covered in the following where 120601
119894designates
the angle between (1119899X119894) 119894 = 1 119896
Lemma 9 Take R119881
as a prospective Reference for 119881perp-
orthogonality
(i) In order that R119881maybe positive definite it is necessary
that 119899X1015840Dminus1X lt 1 that is that 119899sum119896
119894=1(1198832
119894119889119894119894) lt 1
(ii) Equivalently it is necessary thatsum119896119894=1
cos2(120601119894) lt 1with
120601119894as the angle between (1
119899Xi) 119894 = 1 119896
(iii) The Reference variance for 1205730is Var(120573
0| X1015840X = D) =
[119899(1 minus 119899X1015840Dminus1X)]minus1
(iv) The Reference variances for slopes are given by
Var (120573119894| X1015840X = D) =
1
119889119894119894
[1 +1205741198832
119894
119889119894119894
] 1 le 119894 le 119896 (11)
where 120574 = 119899(1 minus 119899X1015840Dminus1X)
Proof The clockwise rule for determinants gives |R119881| =
|D|[119899 minus 1198992X1015840Dminus1X] Conclusion (i) follows since |D| gt 0
The computation
cos (120601119894) =
11015840119899X119894
1003817100381710038171003817111989910038171003817100381710038171003817100381710038171003817X119894
1003817100381710038171003817=
11015840119899X119894
radic119899119889119894119894
= radic1198991198832
119894
119889119894119894
(12)
in parallel with (10) gives conclusion (ii) Using the clockwiserule for block-partitioned inverses the (1 1) element of Rminus1
119881
is given by conclusion (iii) Similarly the lower right block ofRminus1119881 of order (119896 times 119896) is the inverse of H = D minus 119899XX1015840 On
identifying a = b = X 120572 = 119899 and 119886lowast
119894= 119886119894119889119894119894 in Theorem
833 of [26] we have that Hminus1 = Dminus1 + 120574Dminus1XX1015840Dminus1Conclusion (iv) follows on extracting its diagonal elementsto complete our proof
Corollary 10 For the case 119896 = 2 in order that R119881maybe
positive definite it is necessary that 1206011+ 1206012gt 90 deg
Proof Beginning with Lemma 9(ii) compute
cos (1206011+ 1206012)
= cos1206011cos1206012minus sin120601
1sin1206012
= cos1206011cos1206012minus[1minus(cos2120601
1+cos2120601
2)+cos2120601
1cos21206012]12
(13)
which is lt0 when 1 minus (cos21206011+ cos2120601
2) gt 0
Moreover the matrix R119881itself is intrinsically ill conditioned
owing to X = 0 its condition number depending on X Toquantify this dependence we have the following wherecolumns of X have been standardized to common lengths|X119894| = 119888 1 le 119894 le 119896
Lemma 11 Let R119881= [ 119899 T1015840
T 119888I119896
] as in Definition 8 with T = 119899XD = 119888I
119896 and 119888 gt 0
(i) The ordered eigenvalues of R119881are [120582
1 119888 119888 120582
119901]
where 119901 = 119896 + 1 and (1205821 120582119901) are the roots of 120582 isin
[(119899 + 119888) plusmn radic(119899 minus 119888)2+ 41205912]2 and 120591
2= T1015840T
(ii) The roots are positive andR119881is positive definite if and
only if (119899119888 minus 1205912) gt 0
(iii) If R119881
is positive definite its condition number is1198881(R119881) = 1205821120582119901and is increasing in 120591
2
Proof Eigenvalues are roots of the determinantal equation
Det [(119899 minus 120582) T1015840T (119888 minus 120582) I
119896
]
= 0 lArrrArr (119888 minus 120582)119896minus1
[(119899 minus 120582) (119888 minus 120582) minus T1015840T] = 0
(14)
from the clockwise rule giving (119896 minus 1) values 120582 = 119888 and tworoots of the quadratic equation 120582
2minus (119899 + 119888)120582 + 119899119888 minus 120591
2= 0
to give conclusion (i) Conclusion (ii) holds since the productof roots of the quadratic equation is (119899119888 minus 120591
2) and the greater
root is positive Conclusion (iii) follows directly to completeour proof
(119894119894) 119862perp Reference Model As noted the notion of 119862
perp-or-thogonality applies exclusively for models in 119872
0 Accord-
ingly as Reference we seek to alter X1015840X so that the matrixcomprising sums of squares and sums of products of devia-tions from means thus altered is diagonal To achieve thiscanon of 119862
perp-orthogonality and to anticipate notation forsubsequent use we have the following
Definition 12 (i) For a model in 1198720with second-moment
matrix X10158400X0and inverse as in (6) let C
119908= (W minus M) and
identify an indicator vector G = [11989212 11989213 119892
1119896 11989223
1198922119896 119892
119896minus1119896] of order ((119896 minus 1) times (119896 minus 2)) in lexicographic
order where 119892119894119895= 0 119894 = 119895 if the (119894 119895) element of C
119908is zero
and 119892119894119895= 1 119894 = 119895 if the (119894 119895) element ofC
119908is (X1015840119894X119895minus119899119883119894119883119895)
that is unchanged from C = (X1015840X minus 119899XX1015840) = Z1015840Z withZ = B
119899X as the regressors centered to their means
Advances in Decision Sciences 7
(ii) In particular the Reference model in1198720for assessing
119862perp-orthogonality is R
119862= [ 119899 119899X
1015840
119899X W] such that C
119908= (W minus
M) and its inverse G from (6) are diagonal that is takingW = [119908
119894119895] such that 119908
119894119894= 119889119894119894 1 le 119894 le 119896 and 119908
119894119895=
119898119894119895 for all 119894 = 119895 or equivalently G = [0 0 0] In this
case we identify the model to be 119862perp-orthogonal
Recall that conventional VIF119888s for [120573
1 1205732 120573
119896] are
ratios of diagonal elements of (Z1015840Z)minus1 to reciprocals ofthe diagonal elements of the centered Z1015840Z Apparently thisrests on the logic that 119862perp-orthogonality in 119872
0implies that
Z1015840Z is diagonal However the converse fails and instead isembodied in R
119862isin 1198720 In consequence conventional VIF
119888s
are deficient in applying only to slopes whereas the VF119888s
resulting from Definition 12(ii) apply informatively for all of[1205730 1205731 1205732 120573
119896]
Definition 13 Designate VF119907(sdot) and VF
119888(sdot) as Variance Fac-
tors resulting from Reference models of Definition 8 andDefinition 12(ii) respectively On occasion these representVariance Deflation in addition to VIFs
Essential invariance properties pertain ConventionalVIF119888s are scale invariant see Section 3 of [9]We see next that
they are translation invariant as well To these ends we shiftthe columns of X=[X
1X2 X
119896] to a new origin through
X rarr X minus L = H where L = [11988811119899 11988821119899 119888
1198961119899] The
resulting model is
Y = (12057301119899+ L120573) +H120573 + 120598
lArrrArr Y = (1205730+ 120595) 1
119899+H120573 + 120598
(15)
thus preserving slopes where120595 = c1015840120573=(11988811205731+11988821205732+sdot sdot sdot+119888
119896120573119896)
Corresponding to X0= [1119899X] is U
0= [1119899H] and to X1015840
0X0
and its inverse are
U10158400U0= [
119899 11015840119899H
H10158401119899
H1015840H] (U10158400U0)minus1
= [11988700
b01
b101584001
G ] (16)
in the form of (6)This pertains to subsequent developmentsand basic invariance results emerge as follows
Lemma 14 Consider Y = 12057301119899+ X120573 + 120598 together with the
shifted version Y = (1205730+ 120595)1
119899+ H120573 + 120576 both in 119872
0 Then
the matrices G appearing in (6) and (16) are identical
Proof Rules for block-partitioned inverses again assert thatG of (16) is the inverse of
(H1015840H minus 119899minus1H10158401
11989911015840119899H) = H1015840 (I
119899minus 119899minus1111989911015840119899)H
= H1015840B119899H = X1015840B
119899X
(17)
since B119899L = 0 to complete our proof
These facts in turn support subsequent claims that cen-tered VIF
119888s are translation and scale invariant for slopes
1015840
=
[1205731 1205732 120573
119896] apart from the intercept 120573
0
43 A Critique Again we distinguish VIF119888s and VIF
119906s from
centered and uncentered regressorsThe following commentsapply
(C1) A design is either 119881perp or 119862perp-orthogonal respectivelyaccording as the lower right (119896times119896) blockX1015840X ofX1015840
0X0
orC = (X1015840Xminus119899XX1015840) from expression (6) is diagonalOrthogonalities of type 119881perp and 119862
perp are exclusive andhence work at crossed purposes
(C2) In particular 119862perp-orthogonality holds if and only ifthe columns of X are 119881
perp-nonorthogonal If X is 119862perp-orthogonal then for 119894 = 119895 C
119908(119894119895) = X1015840
119894X119895minus 119899119883119894119883119895=
0 and X1015840119894X119895
= 0 Conversely if X indeed is 119881perp-
orthogonal so that X1015840X = D= Diag(11988911 119889
119896119896)
then (D minus 119899XX1015840) cannot be diagonal as Reference inwhich case the conventional VIF
119888s are tenuous
(C3) Conventional VIF119906s based on uncentered X in 119872
0
with X = 0 do not gage variance inflation as claimedfounded on the false tenet that X1015840
0X0can be diagonal
(C4) To detect influential observations and to classify highleverage points casendashinfluence diagnostics namely119891119895119868
= VIF119895(minus119868)
VIF119895 1 le 119895 le 119896 are studied in [27]
for assessing the impact of subsets on variance infla-tion Here VIF
119895is from the full data and VIF
119895(minus119868)on
deleting observations in the index set 119868 Similarly [28]proposes using VIF
(119894)on deleting the 119894th observation
In the present context these would gain substance onmodifying VF
119907and VF
119888accordingly
5 Case Study 1 Continued
51 The Setting We continue an elementary and transparentexample to illustrate essentials Recall119872
0 119884119894= 1205730+12057311198831+
12057321198832+120598119894 of Section 22 the designX
0= [15X1X2] of order
(5 times 3) and U = X10158400X0 and its inverse V = (X10158400X0)minus1 as in
expressions (1) The design is neither 119881perp nor 119862perp orthogonalsince neither the lower right (2 times 2) block of X1015840
0X0nor the
centered Z1015840Z is diagonal Moreover the uncentered VIF119906s as
listed in (3) are not the vaunted relative increases in variancesowing to nonorthogonal columns of X
0 Indeed the only
opportunity for 119881perp-orthogonality here is between columns
[X1X2]
Nonetheless from Section 41 we utilize Theorem 4(i)and (10) to connect the collinearity indices of [9] to anglesbetween subspaces Specifically Minitab recovered valuesfor the residuals RS
119895= x
119895minus P119895x1198952 119895 = 0 1 2 fur-
ther computations proceed routinely as listed in Table 1 Inparticular the principal angle 120579
0between the constant vector
and the span of the regressor vectors is 1205790= 31751 deg as
anticipated in Remark 7
52 119881perp-Orthogonality For subsequent reference let 119863(119909) bea design as in (1) but with second-moment matrix
U (119909) = [
[
5 3 1
3 25 119909
1 119909 3
]
]
(18)
8 Advances in Decision Sciences
Table 1 Values for residuals RS119895= x119895minus P119895x1198952 for X
1198952 for 1205812
119906119895= VI F
119906(120573119895) for (1 minus 1120581
2
119906119895)12 and for 120579
119895 for 119895 = 0 1 2
119895 RS119895
X1198952
1205812
119895(1 minus 1120581
2
119906119895)12
120579119895(deg)
0 13846 50 36111 08503 317511 06429 25 38889 08619 304702 25714 30 11667 03780 67792
Invoking Definition 8 we seek a 119881perp-orthogonal Reference
with moment matrix U(0) having X10158401X2
= 0 that is 119863(0)This is found constructively on retaining [1
5 ∙X2] in X
0
but replacing X1by X01
= [1 05 05 1 0]1015840 as listed in
Table 2 giving the design 119863(0) with columns [15X01X2] as
ReferenceAccordingly R
119881= U(0) is ldquoas 119881
perp-orthogonal as it cangetrdquo given the lengths and sums of (X
1X2) as prescribed
in the experiment and as preserved in the Reference modelLemma 9 with X=[06 02]
1015840 and D= Diag(25 3) givesX1015840Dminus1X = 0157333 and 120574 = 5(1 minus 5X1015840Dminus1X) =
234375 Applications of Lemma 9(iii)-(iv) in succession givethe reference variances
Var (1205730| X1015840X = D) = [5 (1 minus 5X1015840Dminus1X)]
minus1
= 09375
Var (1205731|X1015840X=D)=
1
11988911
[1+1205741198832
1
11988911
]=2
5[1+
072120574
5]=17500
Var (1205732|X1015840X=D)=
1
11988922
[1+1205741198832
2
11988922
]=1
3[1+
004120574
3]=04375
(19)
As Reference these combine with actual variances from119881 = [119881
119894119895] at (1) giving VF
119907for the original designX
0relative
to R119881 For example VF
119907(1205730) = 0722209375 = 07704 with
11988111
= 07222 from (1) and 09375 from (19) This contrastswith VIF
119906(1205730) = 36111 in (3) as in Section 22 and Table 2
it is noteworthy that the nonorthogonal design 119863(1) yieldsuniformly smaller variances than the 119881
perp-orthogonal R119881
namely119863(0)Further versions 119863(119909) 119909 isin [minus05 00 05 06 10 15]
of our basic design are listed for comparison in Table 2 where119909 = X1015840
1X2for the various designs The designs themselves are
exhibited on transposing rowsX10158401= [11988311 11988312 11988313 11988314 11988315]
from Table 2 and substituting each into the template X00
=
[15 ∙X2] The design 119863(2) was constructed but not listed
since its X10158400X0is not invertible Clearly these are actual
designs amenable to experimental deployment
53 119862perp-Orthogonality Continuing X0
= [15X1X2] and
invoking Definition 12(ii) we seek a 119862perp-orthogonal
Reference having the matrix C119908
as diagonal This is
identified as 119863(06) in Table 2 From this the matrix R119862and
its inverse are
R119862= [
[
5 3 1
3 25 06
1 06 3
]
]
Rminus1119862
= [
[
07286 minus08572 minus00714
minus08571 14286 00
minus00714 00 03571
]
]
(20)
The variance factors are listed in (4) where for exampleVF119888(1205732) = 0388903571 = 10889 As distinct from conven-
tional VIF119888s for 120573
1and 120573
2only our VF
119888(1205730) here reflects
Variance Deflationwherein 1205730is estimated more precisely in
the initial 119862perp-nonorthogonal designA further note on orthogonality is germane Suppose
that the actual experiment is 119863(0) yet towards a thoroughdiagnosis the user evaluates the conventional VIF
119888(1205731) =
12250 = VIF119888(1205732) as ratios of variances of 119863(0) to 119863(06)
from Table 2 Unfortunately their meaning is obscured sincea design cannot at once be 119881perp and 119862
perp-orthogonalAs in Remark 2 we next alter the basic designX
0at (1) on
shifting columns of themeasurements to [0 0] asminima andscaling to have squared lengths equal to 119899 = 5 The resultingX0follows directly from (1) The new matrix U = X
1015840
0X0 andits inverse V are
U = [
[
5 42426 42426
42426 5 4
42426 4 5
]
]
V = [
[
1 minus04714 minus04714
minus04714 07778 minus02222
minus04714 minus02222 07778
]
]
(21)
giving conventional VIF119906s as [5 38889 38889] Against 119862perp-
orthogonality this gives the diagnostics
[VF119888(1205730) VF119888(1205731) VF119888(1205732)] = [08140 10889 10889]
(22)
demonstrating in comparison with (4) that VF119888s apart from
1205730 are invariant under translation and scaling a consequence
of Lemma 14We further seek to compare variances for the shifted
and scaled (X1X2) against 119881perp-orthogonality as Reference
However from U = X10158400X0at (21) we determine that 119883
1=
084853 = 1198832and 5[(119883
2
15) + (119883
2
25)] = 14400 Since
the latter exceeds unity we infer from Lemma 9(i) that
Advances in Decision Sciences 9
Table 2 Designs 119863(119909) 119909 isin [minus05 00 05 06 10 15] obtained by substitutingX01= [11988311 11988312 11988313 11988314 11988315]1015840 in [15 ∙X2] of order (5times3)
and corresponding variances Var (1205730)Var (120573
1)Var (120573
2)
Design 11988311
11988312
11988313
11988314
11988315
Var (1205730) Var (120573
1) Var (120573
2)
119863(minus05) 10 00 05 10 05 19333 37333 09333119863(00) 10 05 05 10 00 09375 17500 04375119863(05) 05 10 00 10 05 07436 14359 03590119863(06) 06091 05793 06298 11819 0000 07286 14286 03570119863(10) 00 05 05 10 10 07222 15556 03889119863(15) 00 10 05 05 10 09130 24348 06087
119881perp-orthogonality is incompatible with this configuration
of the data so that the VF119907s are undefined Equivalently
on evaluating 1206011
= 1206012
= 2290 deg and 1206011+ 1206012
=
4580 lt 90 deg Corollary 10 asserts that R119881is not positive
definite This appears to be anomalous until we recallthat 119881perp-orthogonality is invariant under rescaling but notrecentering the regressors
54 A Critique We reiterate apparent ambiguity ascribedin the literature to orthogonality A number of widely heldguiding precepts has evolved as enumerated in part in ouropening paragraph and in Section 3 As in Remark 3 theseclearly have been taken to apply verbatim forX
0andX1015840
0X0in
1198720 to be paraphrased in part as follows(P1) Ill-conditioning espouses inflated variances that is
VIFs necessarily equal or exceed unity(P2) 119862perp-orthogonal designs are ldquoidealrdquo in that VIF
119888s for
such designs are all unity see [11] for exampleWe next reassess these precepts as they play out under119881perp and 119862
perp orthogonality in connection with Table 2(C1) For models in 119872
0having X = 0 we reiterate that
the uncentered VIF119906s at expression (3) namely
[36111 38889 11667] overstate adverse effects onvariances of the 119881perp-nonorthogonal array since X1015840
0X0
cannot be diagonal Moreover these conventionalvalues fail to discern that the revised values for VF
119907s
at (5) namely [07704 08889 08889] reflect that[1205730 1205731 1205732] are estimated with greater efficiency in the
119881perp-nonorthogonal array119863(1)
(C2) For 119881perp-orthogonality the claim P1 thus is false by
counterexampleTheVFs at (5) areVarianceDeflationFactors for design 119863(1) where X1015840
1X2= 1 relative to
the 119881perp-orthogonal119863(0)(C3) Additionally the variances Var[120573
0(119909)] Var[120573
1(119909)]
Var[1205732(119909)] in Table 2 are all seen to decrease from
119863(0) [09375 17500 04375] rarr 119863(1) [0722215556 03889] despite the transition from the 119881
perp-orthogonal 119863(0) to the 119881
perp-nonorthogonal 119863(1)Similar trends are confirmed in other ill-conditioneddata sets from the literature
(C4) For 119862perp-orthogonal designs the claim P2 is false by
counterexample Such designs need not be ldquoidealrdquoas 1205730may be estimated more efficiently in a 119862
perp-nonorthogonal design as demonstrated by the VF
119888s
for 1205730at (4) and (22) Similar trends are confirmed
elsewhere as seen subsequently
(C5) VFs for 1205730are critical in prediction where prediction
variances necessarily depend on Var(1205730) especially
for predicting near the origin of the system of coor-dinates
(C6) Dissonance between 119881perp and 119862
perp is seen in Table 2where 119863(06) as 119862
perp-orthogonal with X10158401X2
= 06is the antithesis of 119881perp-orthogonality at 119863(0) whereX10158401X2= 0
(C7) In short these transparent examples serve to dispelthe decades-old mantra that ill-conditioning neces-sarily spawns inflated variances formodels in119872
0 and
they serve to illuminate the contributing structures
6 Orthogonal and Linked Arrays
A genuine 119881perp-orthogonal array was generated as eigenvec-
tors E = [E1E2 E
8] from a positive definite (8 times 8)
matrix (see Table 83 of [11 p377]) using PROC IML of theSAS programming package The columns [X
1X2X3] as the
second through fourth columns of Table 3 comprise the firstthree eigenvectors scaled to length 8 These apply for modelsin 119872119908and 119872
0to be analyzed In addition linked vectors
(Z1Z2Z3) were constructed as Z
1= 119888(09E
1+ 01E
2)
Z2
= 119888(09E2+ 01E
3) and Z
3= 119888(09E
3+ 01E
4) where
119888 = radic8082 Clearly these arrays as listed in Table 3 are notarchaic abstractions as both are amenable to experimentalimplementation
61 The Model 1198720 We consider in turn the orthogonal and
the linked series
611 Orthogonal Data Matrices X10158400X0and (X1015840
0X0)minus1 for the
orthogonal data under model1198720are listed in Table 4 where
variances occupy diagonals of (X10158400X0)minus1 The conventional
uncentered VIF119906s are [120296 102144 112256 105888]
Since X = 0 we find the angle between the constant andthe span of the regressor vectors to be 120579
0= 65748 deg as
in Theorem 4(ii) Moreover the angle between X1and the
span of [1119899X2X3] namely 120579
1= 81670 deg is not 90 deg
because of collinearity with the constant vector despite themutual orthogonality of [X
1X2X3] Observe here thatX1015840
0X0
10 Advances in Decision Sciences
Table 3 Basic 119881perp-orthogonal design X0 = [18 X1 X2 X3] and a linked design Z0 = [18 Z1 Z2 Z3] of order (8 times 4)
Orthogonal (X1X2X3) Linked (Z
1Z2Z3)
18
X1
X2
X3
18
Z1
Z2
Z3
1 minus1084470 0056899 0541778 1 minus1071550 0116381 06534861 minus1090010 minus0040490 0117026 1 minus1087820 minus0027320 01160861 0979050 0002564 2337454 1 0973345 0260677 21996871 minus1090230 0016839 0170711 1 minus1081700 0035588 00706941 0127352 2821941 minus0071670 1 0438204 2796767 minus00707101 1045409 minus0122670 minus1476870 1 1025468 minus0285010 minus16185601 1090803 minus0088970 0043354 1 1074306 minus0083640 01769281 1090739 minus0092300 0108637 1 1073875 minus0079740 0244567
Table 4 Matrices X10158400X0 and (X10158400X0)minus1 for the orthogonal data under model119872
0
X10158400X0
(X10158400X0)minus1
8 10686 25538 17704 01504 minus00201 minus00480 minus0033310686 8 0 0 minus00201 01277 00064 0004425538 0 8 0 minus00480 00064 01403 0010617704 0 0 8 minus00333 00044 00106 01324
already is 119881perp-orthogonal accordingly R119881= X10158400X0 and thus
the VF119907s are all unity
In view of dissonance between 119881perp and 119862
perp orthogonalitythe sums of squares and products for the mean-centered Xare
X1015840B119899X = [
[
78572 minus03411 minus02365
minus03411 71847 minus05652
minus02365 minus05652 76082
]
]
(23)
reflecting nonnegligible negative dependencies to distin-guish the 119881
perp-orthogonal X from the 119862perp-nonorthogonal
matrix Z = B119899X of deviations
To continue we suppose instead that the 119881perp-orthogonalmodel was to be recast as 119862
perp-orthogonal The Referencemodel and its inverse are listed in Table 5 Variance factorsas ratios of diagonal elements of (X1015840
0X0)minus1 in Table 4 to those
of Rminus1119862
in Table 5 are
[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]
= [10168 10032 10082 10070]
(24)
reflecting negligible differences in precision This parallelsresults reported in Table 2 comparing 119863(0) to 119863(06) asReference except for larger differences in Table 2 namely[128681225012250]
612 Linked Data For the Linked Data under model 1198720
the matrix Z10158400Z0 follows routinely from Table 3 its inverse(Z10158400Z0)minus1 is listed in Table 6 The conventional uncentered
VIF119906s are now [120320 103408 113760 105480] These
again fail to gage variance inflation since Z10158400Z0cannot be
diagonal From (10) the principal angle between the constantvector and the span of the regressor vectors is 120579
0= 65735
degrees
Remark 15 Observe that the choice for R119862may be posed as
seeking R119862
= U(119909) such that Rminus1119862(2 3) = 00 then solving
numerically for 119909 using Maple for example However thealgorithm in Definition 12(ii) affords a direct solution thatC119908should be diagonal stipulates its off-diagonal element as
X10158401X2minus 35 = 0 so that X1015840
1X2= 06 in R
119862at expression (20)
To illustrate Theorem 4(iii) we compute cos(120579) = (1 minus
110889)12 = 02857 and 120579 = 73398 deg as the angle between
the vectors [X1X2] of (1) when centered to their means For
slopes in 1198720 [9] shows that VIF
119888(120573119895) lt VIF
119906(120573119895) 1 le
119894 le 119896 This follows since their numerators are equal butdenominators are reciprocals of lengths of the centered anduncentered regressor vectors To illustrate numerators areequal for VIF
119906(1205733) = 03889(13) = 11667 and for
VIF119888(1205733) = 03889(128) = 10889 but denominators are
reciprocals of X10158402X2= 3 and sum
5
119895=1(1198832119895
minus 1198832)2= 28
(119894) 119881perp Reference Model To rectify this malapropism we seek
in Definition 8 a model R119881as Reference This is found on
setting all off-diagonal elements of Z10158400Z0to zero excluding
the first row and column Its inverse Rminus1119881
is listed in Table 6The VF
119907s are ratios of diagonal elements of (Z1015840
0Z0)minus1 on the
left to diagonal elements of Rminus1119881
on the right namely
[VF119907(1205730) VF119907(1205731) VF119907(1205732) VF119907(1205733)]
= [09697 09991 09936 09943]
(25)
Against conventional wisdom it is counter-intuitive that[1205730 1205731 1205732 1205733] are all estimated with greater precision in the
119881perp-nonorthogonal model Z1015840
0Z0than in the 119881
perp-orthogonalmodel R
119881of Definition 8 As in Section 54 this serves again
to refute the tenet that ill-conditioning espouses inflated
Advances in Decision Sciences 11
Table 5 Matrices R119862and Rminus1
119862for checking 119862
perp-orthogonality in the orthogonal data X0 under model1198720
R119862
Rminus1119862
8 10686 25538 17704 01479 minus00170 minus00444 minus0029110686 8 03411 02365 minus00170 01273 0 025538 03411 8 05652 minus00444 0 01392 017704 02365 05652 8 minus00291 0 0 01314
Table 6 Matrices (Z10158400Z0)minus1 and Rminus1
119881for checking 119881
perp-orthogonality for the linked data under model1198720
(Z10158400Z0)minus1 Rminus1
119881
01504 minus00202 minus00461 minus00283 01551 minus00261 minus00530 minus00344minus00202 01293 minus000797 00053 minus00261 01294 00089 00058minus00461 minus00079 01422 minus00054 minus00530 00089 01431 00117minus00283 00053 minus00054 01319 minus00344 00058 00117 01326
variances for models in1198720 that is that VFs necessarily equal
or exceed unity(119894119894) 119862
perp Reference Model Further consider variances werethe Linked Data to be centered As in Definition 12(ii)we seek a Reference model R
119862such that C
119908= (W minus
M) is diagonal The required R119862
and its inverse arelisted in Table 7 Since variances in the Linked Data are[015040 012926 014220 013185] from Table 6 and Refer-ence values appear as diagonal elements of Rminus1
119862on the right
in Table 7 their ratios give the centered VF119888s as
[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]
= [09920 10049 10047 10030]
(26)
We infer that changes in precision would be negligible ifinstead the Linked Data experiment was to be recast as a 119862perp-orthogonal experiment
As in Remark 15 the reference matrix R119862is constrained
by the first and second moments from X10158400X0and has free
parameters 1199091 1199092 1199093 for the (2 3) (2 4) (3 4) entries
The constraints for 119862perp-orthogonality are Rminus1
119862(2 3) =
Rminus1119862(2 4) = Rminus1
119862(3 4) = 0 Although this system of
equations can be solved with numerical software such asMaple the algorithm stated in Definition 12 easily yields thedirect solution given here
62 The Model 119872119908 For the Linked Data take 119884
119894= 12057311198851198941+
12057321198851198942
+ 12057331198851198943
+ 120598119894 1 le 119894 le 8 as Y = Z120573 + 120598
where the expected response at 1198851
= 1198852
= 1198853
= 0
is 119864(119884) = 00 and there is no occasion to translate theregressorsThe lower right (3 times 3) submatrix ofZ1015840
0Z0(4 times 4)
from 1198720is Z1015840Z its inverse is listed in Table 8 The ratios
of diagonal elements of (Z1015840Z)minus1 to reciprocals of diagonalelements of Z1015840Z are the conventional uncentered VIF
119906s
namely [10123 10247 10123] Equivalently scaling Z1015840Z tohave unit diagonals and inverting gives Cminus1 as in Table 8 itsdiagonal elements are the VIF
119906sThroughout Section 6 these
comprise the only correct interpretation of conventionalVIF119906s as genuine variance ratios
7 Case Study Body Fat Data
71 The Setting Body fat and morphogenic measurementswere reported for 119899 = 20 human subjects in Table D11 of[29 p717] Response and regressor variables are119884 amount ofbody fat X
1 triceps skinfold thickness X
2 thigh circumfer-
ence and X3 mid-arm circumference under119872
0 119884119894= 1205730+
12057311198831+12057321198832+12057331198833+ 120598119894 From the original data as reported
the condition number of X10158400X0is 1039931 and the VIF
119906s are
[2714800 4469419 630998 778703] On scaling columnsof X to equal column lengths X
119894 = 50 119894 = 1 2 3 the
resulting condition number of the revisedX10158400X0is 2 9696518
and the uncenteredVIF119906s are as before from invariance under
scalingAgainst complaints that regressors are remote from their
natural origins we center regressors to [0 0 0] as origin onsubtracting the minimal element from each column as inRemark 2 and scaling these to X
119894 = 50 119894 = 1 2 3
This is motivated on grounds that centered diagnostics apartfrom VIF
119888(1205730) are invariant under shifts and scalings from
Lemma 14 In addition under the new origin [0 0 0] 1205730
assumes prominence as the baseline response against whichchanges due to regressors are to be gaged The original dataare available on the publisherrsquos web page for [29]
Taking these shifted and scaled vectors as columnsof X = [X
1X2X3] in the revised X
0= [1
119899X]
values for X10158400X0
and its inverse are listed in Table 9The condition number of X1015840
0X0
is now 1136969 andits VIF
119906s are [6775617998742782174484] Additionally
from Theorem 4(ii) we recover the angle 1205790= 22592 deg to
quantify collinearity of regressors with the constant vectorIn addition the scaled and centered correlationmatrix for
X is
R = [
[
10 00847 08781
00847 10 01424
08781 01424 10
]
]
(27)
indicating strong linkage between mean-centered versions of(X1X3)Moreover the experimental design is neither119881perp nor
119862perp-orthogonal since neither the lower right (3 times 3) block of
X10158400X0nor the centered matrixC = (X1015840Xminus119899XX1015840) is diagonal
12 Advances in Decision Sciences
Table 7 Matrices R119862and Rminus1
119862for checking 119862
perp-orthogonality in the linked data under model1198720
R119862
Rminus1119862
8 13441 27337 17722 01516 minus00216 minus00484 minus0029113441 8 04593 02977 minus00216 01286 0 027337 04593 8 06056 minus00484 0 01415 017722 02977 06056 8 minus00291 0 0 01314
Table 8 Matrices (Z1015840Z)minus1 and Cminus1 for the linked data under model119872119908
(Z1015840Z)minus1 Cminus1
01265 minus00141 00015 10123 minus01125 00123minus00141 01281 minus00141 minus01125 10247 minus0112500015 minus00141 01265 00123 minus01125 10123
(119894) 119881perp Reference Model To compare this with a 119881
perp-or-thogonal design we turn again to Definition 8 Howeverevaluating the test criterion in Lemma 9(119894) shows that thisconfiguration is not commensurate with 119881
perp-orthogonalityso that the VF
119907s are undefined Comparisons with 119862
perp-orthogonal models are addressed next(119894119894) 119862perp Reference Model As in Definition 12(ii) the centering
matrix is
119899XX1015840 = [
[
188889 189402 187499
189402 189916 188007
187499 188007 186118
]
]
(28)
To check against 119862perp-orthogonality we obtain the matrix Wof Definition 12(ii) on replacing off-diagonal elements of thelower right (3 times 3) submatrix ofX1015840
0X0by corresponding off-
diagonal elements of 119899XX1015840 The result is the matrix R119862and
its inverse as given in Table 10 where diagonal elements ofRminus1119862
are the Reference variances The VF119888s found as ratios
of diagonal elements of (X10158400X0)minus1 relative to those of Rminus1
119862
are listed in Table 12 under G = [0 0 0] the indicator ofDefinition 12(i) for this case
72 Variance Factors and Linkage Traditional gages of ill-conditioning are patently absurd on occasion By conventionVIFs are ldquoall or nonerdquo wherein Reference models entailstrictly diagonal components In practice some regressorsare inextricably linked to get or even visualize orthogonalregressors may go beyond feasible experimental rangesrequire extrapolation beyond practical limits and challengecredulity Response functions so constrained are described in[30] as ldquopicket fencesrdquo In such cases taking C to be diagonalas its 119862perp Reference is moot at best an academic folly abettedin turn by default in standard software packages In shortconventional diagnostics here offer answers to irrelevantquestions Given pairs of regressors irrevocably linked weseek instead to assess effects on variances for other pairsthat could be unlinked by design These comments applyespecially to entries under G = [0 0 0] in Table 12
To proceed we define essential ill-conditioning as regres-sors inextricably linked necessarily to remain so and asnonessential if regressors could be unlinked by design As a
followup to Section 32 for X = 0 we infer that the constantvector is inextricably linked with regressors thus accountingfor essential ill-conditioning not removed by centeringcontrary to claims in [4] Indeed this is the essence ofDefinition 8
Fortunately these limitations may be remedied throughReference models adapted to this purpose This in turnexploits the special notation and trappings of Definition 12(i)as follows In addition toG = [0 0 0] we iterate for indicatorsG = [119892
12 11989213 11989223] taking values in [1 0 0] [0 1 0] [0 0 1]
[1 1 0] [1 0 1] [0 1 1] Values of VF119888s thus obtained are
listed in Table 12 To fix ideas the Reference R119862for G =
[1 0 0] that is for the constraint R119862(2 3) = X1015840
0X0(2 3) =
194533 is given in Table 11 together with its inverse Foundas ratios of diagonal elements of (X1015840
0X0)minus1 relative to those of
Rminus1119862
in Table 11 VF119888s are listed in Table 12 underG = [1 0 0]
In short R119862is ldquoas 119862
perp-orthogonal as it could berdquo given theconstraint Other VF
119888s in Table 12 proceed similarly For all
cases in Table 12 excluding G = [0 1 1] 1205730is estimated
with greater efficiency in the original model than any of thepostulated Reference models
As in Remark 15 the reference matrix R119862
for G =
[1 0 0] is constrained by X10158401X2
= 194533 to be retainedwhereas X1015840
1X3
= 1199091and X1015840
2X3
= 1199092are to be determined
from Rminus1119862(2 4) = Rminus1
119862(3 4) = 0 Although this system
can be solved numerically as noted the algorithm stated inDefinition 12 easily yields the direct solution given here
Recall 1198831 triceps skinfold thickness 119883
2 thigh cir-
cumference and 1198833 mid-arm circumference and from
(27) that (1198831 1198833) are strongly linked Invoking the cutoff
rule [24] for VIFs in excess of 40 it is seen for G =
[0 0 0] in Table 12 that VF119888(1205731) and VF
119888(1205733) are excessive
where [X1X2X3] are forced to be mutually uncorrelated
as Reference Remarkably similar values are reported forG isin [1 0 0] [ 0 0 1] [1 0 1] all gaged against intractableReference models wherein (119883
1 1198833) are postulated to be
uncorrelated however incrediblyOn the other hand VF
119888s for G isin [0 1 0] [1 1 0]
[0 1 1] in Table 12 are likewise comparable where (X1X3)
are allowed to remain linked at X10158401X3= 242362 Here the
VF119888s reflect negligible changes in efficiency of the estimates
Advances in Decision Sciences 13
Table 9 Matrices X10158400X0 and (X10158400X0)minus1 for the body fat data under model119872
0centered to minima and scaled to equal lengths
X10158400X0
(X10158400X0)minus1
20 194365 194893 192934 03388 minus01284 minus01482 minus00203194365 25 194533 242362 minus01284 07199 00299 minus06224194893 194533 25 196832 minus01482 00299 01711 minus00494192934 242362 196832 25 minus00203 minus06224 minus00494 06979
Table 10 Matrices R119862and Rminus1
119862for the body fat data adjusted to give off-diagonal elements the value 00 with code G = [0 0 0]
R119862
Rminus1119862
20 194365 194893 192934 05083 minus01590 minus01622 minus01510194365 25 189402 187499 minus01590 01636 0 0194893 189402 25 188007 minus01622 0 01664 0192934 187499 188007 25 minus01510 0 0 01565
in comparison with the originalX10158400X0 where [X
1X2X3] are
all linked In summary negligible changes in overall efficiencywould accrue on recasting the experiment so that (X
1X2)
and (X2X3) were pairwise uncorrelated
Table 12 conveys further useful information on notingthat VFs as ratios are multiplicative and divisible Forexample take the model codedG = [1 1 0] whereX1015840
1X2and
X10158401X3are retained at their initial values under X1015840
0X0 namely
194533 and 242362 from Table 9 with (X2X3) unlinked
Now taking G = [0 1 0] as reference we extract the VFson dividing elements in Table 12 at G = [0 1 0] by those ofG = [1 1 0] The result is
[VF (1205730) VF (120573
1) VF (120573
2) VF (120573
3)]
= [09196
0984810073
1005710282
1020810208
10208]
= [09338 10017 10072 10000]
(29)
8 Conclusions
Our goal is clarity in the actual workings of VIFs as ill-conditioning diagnostics thus to identify and rectifymiscon-ceptions regarding the use of VIFs for models X
0= [1119899X]
with intercept Would-be arbiters for ldquocorrectrdquo usage havedivided between VIF
119888s for mean-centered regressors and
VIF119906s for uncentered data We take issue with both Conven-
tional but clouded vision holds that (i) VIF119906s gage variance
inflation owing to nonorthogonal columns of X0as vectors
in R119899 (ii) VIFs ge 10 and (iii) models having orthogonalmean-centered regressors are ldquoidealrdquo in possessing VIF
119888s of
value unity Reprising Remark 3 these properties widely andcorrectly understood for models in119872
119908 appear to have been
expropriated without verification for models in1198720hence the
anomaliesAccordingly we have distinguished vector-space orthog-
onality (119881perp) of columns of X0 in contrast to orthogonality
(119862perp) of mean-centered columns of X A key feature is theconstruction of second-moment arrays as Reference models(R119881R119862) encoding orthogonalities of these types and their
Variance Factors as VF119907rsquos and VF
119888rsquos respectively Contrary
to convention we demonstrate analytically and through casestudies that (i1015840) VIF
119906s do not gage variance inflation (ii1015840)
VFs ge 10 need not hold whereas VF lt 10 representsVariance Deflation and (iii1015840) models having orthogonalcentered regressors are not necessarily ldquoidealrdquo In particularvariance deflation occurs when ill-conditioned data yieldsmaller variances than corresponding orthogonal surrogates
In short our studies call out to adjudicate the prolongeddispute regarding centered and uncentered diagnostics formodels in119872
0 Our summary conclusions follow
(S1) The choice of a model with or without intercept isa substantive matter in the context of each exper-imental paradigm That choice is fixed beforehandin a particular setting is structural and is beyondthe scope of the present study Snee and Marquardt[10] correctly assert that VIFs are fully creditedand remain undisputed in models without interceptthese quantities unambiguously retain the meaningsoriginally intended namely as ratios of variances togage effects of nonorthogonal regressors For modelswith intercept the choice is between centered anduncentered diagnostics which is to be weighed asfollows
(S2) ConventionalVIF119888s apply incompletely to slopes only
not necessarily grasping fully that a system is illconditioned Of particular concern is missing evi-dence on collinearity of regressors with the interceptOn the other hand if 119862
perp-orthogonality pertainsthen VF
119888s may supplant VIF
119888s as applicable to all of
[1205730 1205731 120573
119896] Moreover the latter may reveal that
VF119888(1205730) lt 10 as in cited examples and of special
import in predicting near the origin of the coordinatesystem
(S3) Our VF119907s capture correctly the concept apparently
intended by conventional VIF119906s Specifically if 119881perp-
orthogonality pertains then VF119907s as genuine ratios
of variances may supplant VIF119906s as now discredited
gages for variance inflation(S4) Returning to Section 32 we subscribe fully but for
different reasons to Belsleyrsquos and othersrsquo contention
14 Advances in Decision Sciences
Table 11 Matrices R119862and Rminus1
119862for the body fat data adjusted to give off-diagonal elements the value 00 except X10158401X2 code G = [1 0 0]
R119862
Rminus1119862
20 194365 194893 192934 04839 minus01465 minus01497 minus01510194365 25 194533 187499 minus01465 01648 minus00141 0194893 194533 25 188007 minus01497 minus00141 01676 0192934 187499 188007 25 minus01510 0 0 01565
Table 12 Summary VF119888s for the body fat data centered to minima and scaled to common lengths identified with varying G = [119892
12 11989213 11989223]
from Definition 12
Elements of G VF119888s
11989212
11989213
11989223
1205730
1205731
1205732
1205733
0 0 0 06665 43996 10282 44586
1 0 0 07002 43681 10208 44586
0 1 0 09196 10073 10282 10208
0 0 1 07201 43996 10073 43681
1 1 0 09848 10057 10208 10208
1 0 1 07595 43681 10003 43681
0 1 1 10248 10073 10073 10160
that uncentered VIF119906s are essential in assessing ill-
conditioned data In the geometry of ill-conditioningthese should be retained in the pantheon of regres-sion diagnostics not as now debunked gages ofvariance inflation but as more compelling angularmeasures of the collinearity of each column of X
0=
[1119899X1X2 X
119896] with the subspace spanned by
remaining columns(S5) Specifically the degree of collinearity of regressors
with the intercept is quantified by 1205790deg which
if small is known to ldquocorrupt the estimates of allparameters in the model whether or not the interceptis itself of interest and whether or not the data havebeen centeredrdquo [21 p90]This in turnwould appear toelevate 120579
0in importance as a critical ill-conditioning
diagnostic(S6) In contrast Theorem 4(iii) identifies conventional
VIF119888s with angles between a centered regressor and
the subspace spanned by remaining centered regres-sors For example
1205791= arccos([
[
1 minus1
VIFc ( ldquo1205731)]
]
12
) (30)
is the angle between the centered vector Z1= B119899X1
and the remaining centered vectors These lie in thelinear subspace of R119899 comprising the orthogonalcomplement to 1
119899isin R119899 and thus bear on the
geometry of ill-conditioning in this subspace(S7) Whereas conventional VIFs are ldquoall or nonerdquo to
be gaged against strictly diagonal components ourReference models enable unlinking what may beunlinked while leaving other pairs of regressorslinked This enables us to assess effects on variances
attributed to allowable partial dependencies amongsome but not other regressors
In conclusion the foregoing summary points to limita-tions in modern software packages Minitab v15 119878119875119878119878 v19and 119878119860119878 v92 are standard software packages which computecollinearity diagnostics returning with the default optionsthe centered VIFs VIF
119888(120573119895) 1 le 119895 le 119896 To compute
the uncentered VIF119906s one needs to define a variable for
the constant term and perform a linear regression using theoptions of fitting without an intercept On the other handcomputations for VF
119888s and VF
119907s as introduced here require
additional if straightforward programming for exampleMaple and PROC IML of the SAS System as utilized here
References
[1] D A Belsley ldquoDemeaning conditioning diagnostics throughcenteringrdquo The American Statistician vol 38 no 2 pp 73ndash771984
[2] E R Mansfield and B P Helms ldquoDetecting multicollinearityrdquoThe American Statistician vol 36 no 3 pp 158ndash160 1982
[3] S D Simon and J P Lesage ldquoThe impact of collinearityinvolving the intercept term on the numerical accuracy ofregressionrdquo Computer Science in Economics and Managementvol 1 no 2 pp 137ndash152 1988
[4] D W Marquardt and R D Snee ldquoRidge regression in practicerdquoThe American Statistician vol 29 no 1 pp 3ndash20 1975
[5] K N Berk ldquoTolerance and condition in regression computa-tionsrdquo Journal of the American Statistical Association vol 72 no360 part 1 pp 863ndash866 1977
[6] A E Beaton D Rubin and J Barone ldquoThe acceptability ofregression solutions another look at computational accuracyrdquoJournal of the American Statistical Association vol 71 no 353pp 158ndash168 1976
Advances in Decision Sciences 15
[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975
[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970
[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987
[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984
[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990
[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980
[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969
[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996
[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003
[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006
[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008
[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010
[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981
[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983
[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987
[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986
[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980
[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007
[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984
[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969
[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997
[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000
[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003
[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983
Advances in Decision Sciences 5
subspaces To these ends choose a typical x119895in X0 and
rearrange X0as [x119895X[119895]] We next seek elements of
Q1015840119895(X10158400X0)minus1
Q119895= [
x1015840119895x119895
x1015840119895X[119895]
X1015840[119895]x119895X1015840[119895]X[119895]
]
minus1
(8)
as reordered by the permutation matrix Q119895 From the clock-
wise rule the (1 1) element of the inverse is
[x1015840119895(I119899minus P119895) x119895]minus1
= [x1015840119895x119895minus x1015840119895P119895x119895]minus1
=10038171003817100381710038171003817xdagger119895
10038171003817100381710038171003817
2 (9)
in succession for each 119895 = 0 1 119896 where P119895
=
X[119895][X1015840[119895]X[119895]]minus1X1015840[119895]
is the projection operator onto the sub-space S
119901(X[119895]) sub R119899 spanned by the columns of X
[119895] These
in turn enable us to connect 1205812119906119895 119895 = 0 1 119896 and similarly
for centered values 1205812119888119895 119895 = 1 119896 to the geometry of ill-
conditioning as follows
Theorem 4 For models in 1198720let VIF
119906(120573119895) = 120581
2
119906119895 119895 = 0
1 119896 be conventional uncentered VIF119906s in terms of Stew-
artrsquos [9] uncentered collinearity indices These in turn quantifythe extent of collinearities between subspaces through angles (in119889119890119892 as follows
(i) Angles between (x119895S119901(X[119895])) are given by 120579
119895=
arccos[(1minus11205812uj)12
] in succession for 119895 = 0 1 119896
(ii) In particular 1205790
= arccos[(1 minus 11205812u0)12
] quantifiesthe degree of collinearity between the regressors and theconstant vector
(iii) Similarly let B119899X = Z = [z
1 z
119896] be regressors
centered to their means rearrange as [z119895Z[119895]] and let
VIF119888(120573119895) = 120581
2
119888119895 119895 = 1 119896 be centered VIF
119888s in
terms of Stewartrsquos centered collinearity indices Thenangles (in deg) between (z
119895S119901(Z[119895])) are given by
120579119895= arccos[(1 minus 11205812cj)
12] 1 le j le k
Proof From the geometry of the right triangle formed by(x119895P119895x119895) the squared lengths satisfy x
1198952
= P119895x1198952+
x119895minus P119895x1198952 where x
119895minus P119895x1198952
= RS119895is the residual sum
of squares from the projection It follows that the principalangle between (x
119895P119895x119895) is given by
cos (120579119895)=
x1015840119895P119895x119895
10038171003817100381710038171003817x119895
10038171003817100381710038171003817sdot10038171003817100381710038171003817P119895x119895
10038171003817100381710038171003817
=
10038171003817100381710038171003817P119895x119895
1003817100381710038171003817100381710038171003817100381710038171003817x119895
10038171003817100381710038171003817
=radic1 minusRS119895
10038171003817100381710038171003817x119895
10038171003817100381710038171003817
2=radic1 minus
1
1205812119906119895
(10)
for 119895 = 0 1 119896 to give conclusion (i) Conclusion(ii) follows on specializing (x
0P0x0) with x
0= 1119899and
P0= X(X1015840X)
minus1X1015840 Conclusion (iii) follows similarly from thegeometry of the right triangle formed by (z
119895P119895z119895) where P
119895
now is the projection operator onto the subspace S119901(Z[119895]) sub
R119899 spanned by the columns of Z[119895] to complete our proof
Remark 5 Rules of thumb in common use for problematicVIFs include those exceeding 10 or even 4 see [11 24] forexample In angular measure these correspond respectivelyto 120579119895lt 18435 deg and 120579
119895lt 300 deg
42 Reference Models We seek as Reference feasible modelsencoding orthogonalities of types 119881
perp and 119862perp The keys
are as follows (i) to retain essentials of the experimentalstructure and (ii) to alter what may be changed to achieveorthogonality For a model in119872
119908with moment matrix X1015840X
our opening paragraph prescribes as reference the modelR119881
= D = Diag(11988911 119889
119901119901) as diagonal elements of
X1015840X for assessing 119881perp-orthogonality Moreover on scaling
columns of X to equal lengths R119881is perfectly conditioned
with 1198881(R119881) = 10 In addition every model in 119872
119908clearly
conforms with its Reference in the sense that R119881is positive
definite as distinct from models in1198720to follow
Consider again models in 1198720as in (6) let C = (X1015840X minus
M) withM = 119899XX1015840 as the mean-centering matrix and againletD = Diag(119889
11 119889
119896119896) comprise the diagonal elements of
X1015840X
(i) 119881perpReference ModelThe uncentered VIF119906s in119872
0 defined
as ratios of diagonal elements of (X10158400X0)minus1 to reciprocals of
diagonal elements of X10158400X0 appear to have seen exclusive
usage apparently in keepingwithRemark 3However the fol-lowing disclaimermust be registered as the formal equivalentof Remark 1
Theorem 6 Statements that conventional VIF119906s quantify var-
iance inflation owing to nondiagonalX10158400X0are false for models
in1198720having X = 0
Proof Since the Reference variances are reciprocals of diag-onal elements of X1015840
0X0 this usage is predicated on the false
postulate that X10158400X0can be diagonal for X = 0 Specifically
[1119899X] are linearly independent that is 11015840
119899X = 0 if and only
if X has been mean centered beforehand
To the contrary Gunst [25] purports to show thatVIF119906(1205730) registers genuine variance inflation namely the
price to be paid in variance for designing an experimenthaving X = 0 as opposed to X = 0 Since variances forintercepts are Var(sdot | X = 0) = 120590
2119899 and Var(sdot | X = 0) =
120590211988800
ge 1205902119899 from (6) their ratio is shown in [25] to be
1205812
1199060= 11989911988800
ge 1 in the parlance of Section 23 We concede thisto be a ratio of variances but to the contrary not a VIF sincethe parameters differ In particular Var(120573
0| X = 0) = 120590
211988800
whereas Var( | X = 0) = 1205902119899 with 120572 = 120573
0+ 12057311198831+
sdot sdot sdot + 120573119896119883119896in centered regressors Nonetheless we still find
the conventional VIF119906(1205730) to be useful for purposes to follow
Remark 7 Section 3 highlights continuing concern in regardto collinearity of regressors with the constant vectorTheorem 4(ii) and expression (10) support the use of 120579
0=
arccos[(1 minus 11205812u0)12
] as an informative gage on the extent of
6 Advances in Decision Sciences
this occurrence Specifically the smaller the angle the greaterthe extent of such collinearity
Instead of conventional VIF119906s given the foregoing dis-
claimer we have the following amended version as Referencefor uncentered diagnostics altering whatmay be changed butleaving X = 0 intact
Definition 8 Given a model in 1198720with second-moment
matrixX10158400X0 the amendedReferencemodel for assessing119881perp-
orthogonality is R119881
= [ 119899 119899X1015840
119899X D] with D = Diag(119889
11 119889
119896119896)
as diagonal elements of X1015840X provided that R119881is positive
definite We identify a model to be 119881perp-orthogonal when
X10158400X0= R119881
As anticipated a prospective R119881
fails to conform toexperimental data if not positive definite These and furtherprospects are covered in the following where 120601
119894designates
the angle between (1119899X119894) 119894 = 1 119896
Lemma 9 Take R119881
as a prospective Reference for 119881perp-
orthogonality
(i) In order that R119881maybe positive definite it is necessary
that 119899X1015840Dminus1X lt 1 that is that 119899sum119896
119894=1(1198832
119894119889119894119894) lt 1
(ii) Equivalently it is necessary thatsum119896119894=1
cos2(120601119894) lt 1with
120601119894as the angle between (1
119899Xi) 119894 = 1 119896
(iii) The Reference variance for 1205730is Var(120573
0| X1015840X = D) =
[119899(1 minus 119899X1015840Dminus1X)]minus1
(iv) The Reference variances for slopes are given by
Var (120573119894| X1015840X = D) =
1
119889119894119894
[1 +1205741198832
119894
119889119894119894
] 1 le 119894 le 119896 (11)
where 120574 = 119899(1 minus 119899X1015840Dminus1X)
Proof The clockwise rule for determinants gives |R119881| =
|D|[119899 minus 1198992X1015840Dminus1X] Conclusion (i) follows since |D| gt 0
The computation
cos (120601119894) =
11015840119899X119894
1003817100381710038171003817111989910038171003817100381710038171003817100381710038171003817X119894
1003817100381710038171003817=
11015840119899X119894
radic119899119889119894119894
= radic1198991198832
119894
119889119894119894
(12)
in parallel with (10) gives conclusion (ii) Using the clockwiserule for block-partitioned inverses the (1 1) element of Rminus1
119881
is given by conclusion (iii) Similarly the lower right block ofRminus1119881 of order (119896 times 119896) is the inverse of H = D minus 119899XX1015840 On
identifying a = b = X 120572 = 119899 and 119886lowast
119894= 119886119894119889119894119894 in Theorem
833 of [26] we have that Hminus1 = Dminus1 + 120574Dminus1XX1015840Dminus1Conclusion (iv) follows on extracting its diagonal elementsto complete our proof
Corollary 10 For the case 119896 = 2 in order that R119881maybe
positive definite it is necessary that 1206011+ 1206012gt 90 deg
Proof Beginning with Lemma 9(ii) compute
cos (1206011+ 1206012)
= cos1206011cos1206012minus sin120601
1sin1206012
= cos1206011cos1206012minus[1minus(cos2120601
1+cos2120601
2)+cos2120601
1cos21206012]12
(13)
which is lt0 when 1 minus (cos21206011+ cos2120601
2) gt 0
Moreover the matrix R119881itself is intrinsically ill conditioned
owing to X = 0 its condition number depending on X Toquantify this dependence we have the following wherecolumns of X have been standardized to common lengths|X119894| = 119888 1 le 119894 le 119896
Lemma 11 Let R119881= [ 119899 T1015840
T 119888I119896
] as in Definition 8 with T = 119899XD = 119888I
119896 and 119888 gt 0
(i) The ordered eigenvalues of R119881are [120582
1 119888 119888 120582
119901]
where 119901 = 119896 + 1 and (1205821 120582119901) are the roots of 120582 isin
[(119899 + 119888) plusmn radic(119899 minus 119888)2+ 41205912]2 and 120591
2= T1015840T
(ii) The roots are positive andR119881is positive definite if and
only if (119899119888 minus 1205912) gt 0
(iii) If R119881
is positive definite its condition number is1198881(R119881) = 1205821120582119901and is increasing in 120591
2
Proof Eigenvalues are roots of the determinantal equation
Det [(119899 minus 120582) T1015840T (119888 minus 120582) I
119896
]
= 0 lArrrArr (119888 minus 120582)119896minus1
[(119899 minus 120582) (119888 minus 120582) minus T1015840T] = 0
(14)
from the clockwise rule giving (119896 minus 1) values 120582 = 119888 and tworoots of the quadratic equation 120582
2minus (119899 + 119888)120582 + 119899119888 minus 120591
2= 0
to give conclusion (i) Conclusion (ii) holds since the productof roots of the quadratic equation is (119899119888 minus 120591
2) and the greater
root is positive Conclusion (iii) follows directly to completeour proof
(119894119894) 119862perp Reference Model As noted the notion of 119862
perp-or-thogonality applies exclusively for models in 119872
0 Accord-
ingly as Reference we seek to alter X1015840X so that the matrixcomprising sums of squares and sums of products of devia-tions from means thus altered is diagonal To achieve thiscanon of 119862
perp-orthogonality and to anticipate notation forsubsequent use we have the following
Definition 12 (i) For a model in 1198720with second-moment
matrix X10158400X0and inverse as in (6) let C
119908= (W minus M) and
identify an indicator vector G = [11989212 11989213 119892
1119896 11989223
1198922119896 119892
119896minus1119896] of order ((119896 minus 1) times (119896 minus 2)) in lexicographic
order where 119892119894119895= 0 119894 = 119895 if the (119894 119895) element of C
119908is zero
and 119892119894119895= 1 119894 = 119895 if the (119894 119895) element ofC
119908is (X1015840119894X119895minus119899119883119894119883119895)
that is unchanged from C = (X1015840X minus 119899XX1015840) = Z1015840Z withZ = B
119899X as the regressors centered to their means
Advances in Decision Sciences 7
(ii) In particular the Reference model in1198720for assessing
119862perp-orthogonality is R
119862= [ 119899 119899X
1015840
119899X W] such that C
119908= (W minus
M) and its inverse G from (6) are diagonal that is takingW = [119908
119894119895] such that 119908
119894119894= 119889119894119894 1 le 119894 le 119896 and 119908
119894119895=
119898119894119895 for all 119894 = 119895 or equivalently G = [0 0 0] In this
case we identify the model to be 119862perp-orthogonal
Recall that conventional VIF119888s for [120573
1 1205732 120573
119896] are
ratios of diagonal elements of (Z1015840Z)minus1 to reciprocals ofthe diagonal elements of the centered Z1015840Z Apparently thisrests on the logic that 119862perp-orthogonality in 119872
0implies that
Z1015840Z is diagonal However the converse fails and instead isembodied in R
119862isin 1198720 In consequence conventional VIF
119888s
are deficient in applying only to slopes whereas the VF119888s
resulting from Definition 12(ii) apply informatively for all of[1205730 1205731 1205732 120573
119896]
Definition 13 Designate VF119907(sdot) and VF
119888(sdot) as Variance Fac-
tors resulting from Reference models of Definition 8 andDefinition 12(ii) respectively On occasion these representVariance Deflation in addition to VIFs
Essential invariance properties pertain ConventionalVIF119888s are scale invariant see Section 3 of [9]We see next that
they are translation invariant as well To these ends we shiftthe columns of X=[X
1X2 X
119896] to a new origin through
X rarr X minus L = H where L = [11988811119899 11988821119899 119888
1198961119899] The
resulting model is
Y = (12057301119899+ L120573) +H120573 + 120598
lArrrArr Y = (1205730+ 120595) 1
119899+H120573 + 120598
(15)
thus preserving slopes where120595 = c1015840120573=(11988811205731+11988821205732+sdot sdot sdot+119888
119896120573119896)
Corresponding to X0= [1119899X] is U
0= [1119899H] and to X1015840
0X0
and its inverse are
U10158400U0= [
119899 11015840119899H
H10158401119899
H1015840H] (U10158400U0)minus1
= [11988700
b01
b101584001
G ] (16)
in the form of (6)This pertains to subsequent developmentsand basic invariance results emerge as follows
Lemma 14 Consider Y = 12057301119899+ X120573 + 120598 together with the
shifted version Y = (1205730+ 120595)1
119899+ H120573 + 120576 both in 119872
0 Then
the matrices G appearing in (6) and (16) are identical
Proof Rules for block-partitioned inverses again assert thatG of (16) is the inverse of
(H1015840H minus 119899minus1H10158401
11989911015840119899H) = H1015840 (I
119899minus 119899minus1111989911015840119899)H
= H1015840B119899H = X1015840B
119899X
(17)
since B119899L = 0 to complete our proof
These facts in turn support subsequent claims that cen-tered VIF
119888s are translation and scale invariant for slopes
1015840
=
[1205731 1205732 120573
119896] apart from the intercept 120573
0
43 A Critique Again we distinguish VIF119888s and VIF
119906s from
centered and uncentered regressorsThe following commentsapply
(C1) A design is either 119881perp or 119862perp-orthogonal respectivelyaccording as the lower right (119896times119896) blockX1015840X ofX1015840
0X0
orC = (X1015840Xminus119899XX1015840) from expression (6) is diagonalOrthogonalities of type 119881perp and 119862
perp are exclusive andhence work at crossed purposes
(C2) In particular 119862perp-orthogonality holds if and only ifthe columns of X are 119881
perp-nonorthogonal If X is 119862perp-orthogonal then for 119894 = 119895 C
119908(119894119895) = X1015840
119894X119895minus 119899119883119894119883119895=
0 and X1015840119894X119895
= 0 Conversely if X indeed is 119881perp-
orthogonal so that X1015840X = D= Diag(11988911 119889
119896119896)
then (D minus 119899XX1015840) cannot be diagonal as Reference inwhich case the conventional VIF
119888s are tenuous
(C3) Conventional VIF119906s based on uncentered X in 119872
0
with X = 0 do not gage variance inflation as claimedfounded on the false tenet that X1015840
0X0can be diagonal
(C4) To detect influential observations and to classify highleverage points casendashinfluence diagnostics namely119891119895119868
= VIF119895(minus119868)
VIF119895 1 le 119895 le 119896 are studied in [27]
for assessing the impact of subsets on variance infla-tion Here VIF
119895is from the full data and VIF
119895(minus119868)on
deleting observations in the index set 119868 Similarly [28]proposes using VIF
(119894)on deleting the 119894th observation
In the present context these would gain substance onmodifying VF
119907and VF
119888accordingly
5 Case Study 1 Continued
51 The Setting We continue an elementary and transparentexample to illustrate essentials Recall119872
0 119884119894= 1205730+12057311198831+
12057321198832+120598119894 of Section 22 the designX
0= [15X1X2] of order
(5 times 3) and U = X10158400X0 and its inverse V = (X10158400X0)minus1 as in
expressions (1) The design is neither 119881perp nor 119862perp orthogonalsince neither the lower right (2 times 2) block of X1015840
0X0nor the
centered Z1015840Z is diagonal Moreover the uncentered VIF119906s as
listed in (3) are not the vaunted relative increases in variancesowing to nonorthogonal columns of X
0 Indeed the only
opportunity for 119881perp-orthogonality here is between columns
[X1X2]
Nonetheless from Section 41 we utilize Theorem 4(i)and (10) to connect the collinearity indices of [9] to anglesbetween subspaces Specifically Minitab recovered valuesfor the residuals RS
119895= x
119895minus P119895x1198952 119895 = 0 1 2 fur-
ther computations proceed routinely as listed in Table 1 Inparticular the principal angle 120579
0between the constant vector
and the span of the regressor vectors is 1205790= 31751 deg as
anticipated in Remark 7
52 119881perp-Orthogonality For subsequent reference let 119863(119909) bea design as in (1) but with second-moment matrix
U (119909) = [
[
5 3 1
3 25 119909
1 119909 3
]
]
(18)
8 Advances in Decision Sciences
Table 1 Values for residuals RS119895= x119895minus P119895x1198952 for X
1198952 for 1205812
119906119895= VI F
119906(120573119895) for (1 minus 1120581
2
119906119895)12 and for 120579
119895 for 119895 = 0 1 2
119895 RS119895
X1198952
1205812
119895(1 minus 1120581
2
119906119895)12
120579119895(deg)
0 13846 50 36111 08503 317511 06429 25 38889 08619 304702 25714 30 11667 03780 67792
Invoking Definition 8 we seek a 119881perp-orthogonal Reference
with moment matrix U(0) having X10158401X2
= 0 that is 119863(0)This is found constructively on retaining [1
5 ∙X2] in X
0
but replacing X1by X01
= [1 05 05 1 0]1015840 as listed in
Table 2 giving the design 119863(0) with columns [15X01X2] as
ReferenceAccordingly R
119881= U(0) is ldquoas 119881
perp-orthogonal as it cangetrdquo given the lengths and sums of (X
1X2) as prescribed
in the experiment and as preserved in the Reference modelLemma 9 with X=[06 02]
1015840 and D= Diag(25 3) givesX1015840Dminus1X = 0157333 and 120574 = 5(1 minus 5X1015840Dminus1X) =
234375 Applications of Lemma 9(iii)-(iv) in succession givethe reference variances
Var (1205730| X1015840X = D) = [5 (1 minus 5X1015840Dminus1X)]
minus1
= 09375
Var (1205731|X1015840X=D)=
1
11988911
[1+1205741198832
1
11988911
]=2
5[1+
072120574
5]=17500
Var (1205732|X1015840X=D)=
1
11988922
[1+1205741198832
2
11988922
]=1
3[1+
004120574
3]=04375
(19)
As Reference these combine with actual variances from119881 = [119881
119894119895] at (1) giving VF
119907for the original designX
0relative
to R119881 For example VF
119907(1205730) = 0722209375 = 07704 with
11988111
= 07222 from (1) and 09375 from (19) This contrastswith VIF
119906(1205730) = 36111 in (3) as in Section 22 and Table 2
it is noteworthy that the nonorthogonal design 119863(1) yieldsuniformly smaller variances than the 119881
perp-orthogonal R119881
namely119863(0)Further versions 119863(119909) 119909 isin [minus05 00 05 06 10 15]
of our basic design are listed for comparison in Table 2 where119909 = X1015840
1X2for the various designs The designs themselves are
exhibited on transposing rowsX10158401= [11988311 11988312 11988313 11988314 11988315]
from Table 2 and substituting each into the template X00
=
[15 ∙X2] The design 119863(2) was constructed but not listed
since its X10158400X0is not invertible Clearly these are actual
designs amenable to experimental deployment
53 119862perp-Orthogonality Continuing X0
= [15X1X2] and
invoking Definition 12(ii) we seek a 119862perp-orthogonal
Reference having the matrix C119908
as diagonal This is
identified as 119863(06) in Table 2 From this the matrix R119862and
its inverse are
R119862= [
[
5 3 1
3 25 06
1 06 3
]
]
Rminus1119862
= [
[
07286 minus08572 minus00714
minus08571 14286 00
minus00714 00 03571
]
]
(20)
The variance factors are listed in (4) where for exampleVF119888(1205732) = 0388903571 = 10889 As distinct from conven-
tional VIF119888s for 120573
1and 120573
2only our VF
119888(1205730) here reflects
Variance Deflationwherein 1205730is estimated more precisely in
the initial 119862perp-nonorthogonal designA further note on orthogonality is germane Suppose
that the actual experiment is 119863(0) yet towards a thoroughdiagnosis the user evaluates the conventional VIF
119888(1205731) =
12250 = VIF119888(1205732) as ratios of variances of 119863(0) to 119863(06)
from Table 2 Unfortunately their meaning is obscured sincea design cannot at once be 119881perp and 119862
perp-orthogonalAs in Remark 2 we next alter the basic designX
0at (1) on
shifting columns of themeasurements to [0 0] asminima andscaling to have squared lengths equal to 119899 = 5 The resultingX0follows directly from (1) The new matrix U = X
1015840
0X0 andits inverse V are
U = [
[
5 42426 42426
42426 5 4
42426 4 5
]
]
V = [
[
1 minus04714 minus04714
minus04714 07778 minus02222
minus04714 minus02222 07778
]
]
(21)
giving conventional VIF119906s as [5 38889 38889] Against 119862perp-
orthogonality this gives the diagnostics
[VF119888(1205730) VF119888(1205731) VF119888(1205732)] = [08140 10889 10889]
(22)
demonstrating in comparison with (4) that VF119888s apart from
1205730 are invariant under translation and scaling a consequence
of Lemma 14We further seek to compare variances for the shifted
and scaled (X1X2) against 119881perp-orthogonality as Reference
However from U = X10158400X0at (21) we determine that 119883
1=
084853 = 1198832and 5[(119883
2
15) + (119883
2
25)] = 14400 Since
the latter exceeds unity we infer from Lemma 9(i) that
Advances in Decision Sciences 9
Table 2 Designs 119863(119909) 119909 isin [minus05 00 05 06 10 15] obtained by substitutingX01= [11988311 11988312 11988313 11988314 11988315]1015840 in [15 ∙X2] of order (5times3)
and corresponding variances Var (1205730)Var (120573
1)Var (120573
2)
Design 11988311
11988312
11988313
11988314
11988315
Var (1205730) Var (120573
1) Var (120573
2)
119863(minus05) 10 00 05 10 05 19333 37333 09333119863(00) 10 05 05 10 00 09375 17500 04375119863(05) 05 10 00 10 05 07436 14359 03590119863(06) 06091 05793 06298 11819 0000 07286 14286 03570119863(10) 00 05 05 10 10 07222 15556 03889119863(15) 00 10 05 05 10 09130 24348 06087
119881perp-orthogonality is incompatible with this configuration
of the data so that the VF119907s are undefined Equivalently
on evaluating 1206011
= 1206012
= 2290 deg and 1206011+ 1206012
=
4580 lt 90 deg Corollary 10 asserts that R119881is not positive
definite This appears to be anomalous until we recallthat 119881perp-orthogonality is invariant under rescaling but notrecentering the regressors
54 A Critique We reiterate apparent ambiguity ascribedin the literature to orthogonality A number of widely heldguiding precepts has evolved as enumerated in part in ouropening paragraph and in Section 3 As in Remark 3 theseclearly have been taken to apply verbatim forX
0andX1015840
0X0in
1198720 to be paraphrased in part as follows(P1) Ill-conditioning espouses inflated variances that is
VIFs necessarily equal or exceed unity(P2) 119862perp-orthogonal designs are ldquoidealrdquo in that VIF
119888s for
such designs are all unity see [11] for exampleWe next reassess these precepts as they play out under119881perp and 119862
perp orthogonality in connection with Table 2(C1) For models in 119872
0having X = 0 we reiterate that
the uncentered VIF119906s at expression (3) namely
[36111 38889 11667] overstate adverse effects onvariances of the 119881perp-nonorthogonal array since X1015840
0X0
cannot be diagonal Moreover these conventionalvalues fail to discern that the revised values for VF
119907s
at (5) namely [07704 08889 08889] reflect that[1205730 1205731 1205732] are estimated with greater efficiency in the
119881perp-nonorthogonal array119863(1)
(C2) For 119881perp-orthogonality the claim P1 thus is false by
counterexampleTheVFs at (5) areVarianceDeflationFactors for design 119863(1) where X1015840
1X2= 1 relative to
the 119881perp-orthogonal119863(0)(C3) Additionally the variances Var[120573
0(119909)] Var[120573
1(119909)]
Var[1205732(119909)] in Table 2 are all seen to decrease from
119863(0) [09375 17500 04375] rarr 119863(1) [0722215556 03889] despite the transition from the 119881
perp-orthogonal 119863(0) to the 119881
perp-nonorthogonal 119863(1)Similar trends are confirmed in other ill-conditioneddata sets from the literature
(C4) For 119862perp-orthogonal designs the claim P2 is false by
counterexample Such designs need not be ldquoidealrdquoas 1205730may be estimated more efficiently in a 119862
perp-nonorthogonal design as demonstrated by the VF
119888s
for 1205730at (4) and (22) Similar trends are confirmed
elsewhere as seen subsequently
(C5) VFs for 1205730are critical in prediction where prediction
variances necessarily depend on Var(1205730) especially
for predicting near the origin of the system of coor-dinates
(C6) Dissonance between 119881perp and 119862
perp is seen in Table 2where 119863(06) as 119862
perp-orthogonal with X10158401X2
= 06is the antithesis of 119881perp-orthogonality at 119863(0) whereX10158401X2= 0
(C7) In short these transparent examples serve to dispelthe decades-old mantra that ill-conditioning neces-sarily spawns inflated variances formodels in119872
0 and
they serve to illuminate the contributing structures
6 Orthogonal and Linked Arrays
A genuine 119881perp-orthogonal array was generated as eigenvec-
tors E = [E1E2 E
8] from a positive definite (8 times 8)
matrix (see Table 83 of [11 p377]) using PROC IML of theSAS programming package The columns [X
1X2X3] as the
second through fourth columns of Table 3 comprise the firstthree eigenvectors scaled to length 8 These apply for modelsin 119872119908and 119872
0to be analyzed In addition linked vectors
(Z1Z2Z3) were constructed as Z
1= 119888(09E
1+ 01E
2)
Z2
= 119888(09E2+ 01E
3) and Z
3= 119888(09E
3+ 01E
4) where
119888 = radic8082 Clearly these arrays as listed in Table 3 are notarchaic abstractions as both are amenable to experimentalimplementation
61 The Model 1198720 We consider in turn the orthogonal and
the linked series
611 Orthogonal Data Matrices X10158400X0and (X1015840
0X0)minus1 for the
orthogonal data under model1198720are listed in Table 4 where
variances occupy diagonals of (X10158400X0)minus1 The conventional
uncentered VIF119906s are [120296 102144 112256 105888]
Since X = 0 we find the angle between the constant andthe span of the regressor vectors to be 120579
0= 65748 deg as
in Theorem 4(ii) Moreover the angle between X1and the
span of [1119899X2X3] namely 120579
1= 81670 deg is not 90 deg
because of collinearity with the constant vector despite themutual orthogonality of [X
1X2X3] Observe here thatX1015840
0X0
10 Advances in Decision Sciences
Table 3 Basic 119881perp-orthogonal design X0 = [18 X1 X2 X3] and a linked design Z0 = [18 Z1 Z2 Z3] of order (8 times 4)
Orthogonal (X1X2X3) Linked (Z
1Z2Z3)
18
X1
X2
X3
18
Z1
Z2
Z3
1 minus1084470 0056899 0541778 1 minus1071550 0116381 06534861 minus1090010 minus0040490 0117026 1 minus1087820 minus0027320 01160861 0979050 0002564 2337454 1 0973345 0260677 21996871 minus1090230 0016839 0170711 1 minus1081700 0035588 00706941 0127352 2821941 minus0071670 1 0438204 2796767 minus00707101 1045409 minus0122670 minus1476870 1 1025468 minus0285010 minus16185601 1090803 minus0088970 0043354 1 1074306 minus0083640 01769281 1090739 minus0092300 0108637 1 1073875 minus0079740 0244567
Table 4 Matrices X10158400X0 and (X10158400X0)minus1 for the orthogonal data under model119872
0
X10158400X0
(X10158400X0)minus1
8 10686 25538 17704 01504 minus00201 minus00480 minus0033310686 8 0 0 minus00201 01277 00064 0004425538 0 8 0 minus00480 00064 01403 0010617704 0 0 8 minus00333 00044 00106 01324
already is 119881perp-orthogonal accordingly R119881= X10158400X0 and thus
the VF119907s are all unity
In view of dissonance between 119881perp and 119862
perp orthogonalitythe sums of squares and products for the mean-centered Xare
X1015840B119899X = [
[
78572 minus03411 minus02365
minus03411 71847 minus05652
minus02365 minus05652 76082
]
]
(23)
reflecting nonnegligible negative dependencies to distin-guish the 119881
perp-orthogonal X from the 119862perp-nonorthogonal
matrix Z = B119899X of deviations
To continue we suppose instead that the 119881perp-orthogonalmodel was to be recast as 119862
perp-orthogonal The Referencemodel and its inverse are listed in Table 5 Variance factorsas ratios of diagonal elements of (X1015840
0X0)minus1 in Table 4 to those
of Rminus1119862
in Table 5 are
[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]
= [10168 10032 10082 10070]
(24)
reflecting negligible differences in precision This parallelsresults reported in Table 2 comparing 119863(0) to 119863(06) asReference except for larger differences in Table 2 namely[128681225012250]
612 Linked Data For the Linked Data under model 1198720
the matrix Z10158400Z0 follows routinely from Table 3 its inverse(Z10158400Z0)minus1 is listed in Table 6 The conventional uncentered
VIF119906s are now [120320 103408 113760 105480] These
again fail to gage variance inflation since Z10158400Z0cannot be
diagonal From (10) the principal angle between the constantvector and the span of the regressor vectors is 120579
0= 65735
degrees
Remark 15 Observe that the choice for R119862may be posed as
seeking R119862
= U(119909) such that Rminus1119862(2 3) = 00 then solving
numerically for 119909 using Maple for example However thealgorithm in Definition 12(ii) affords a direct solution thatC119908should be diagonal stipulates its off-diagonal element as
X10158401X2minus 35 = 0 so that X1015840
1X2= 06 in R
119862at expression (20)
To illustrate Theorem 4(iii) we compute cos(120579) = (1 minus
110889)12 = 02857 and 120579 = 73398 deg as the angle between
the vectors [X1X2] of (1) when centered to their means For
slopes in 1198720 [9] shows that VIF
119888(120573119895) lt VIF
119906(120573119895) 1 le
119894 le 119896 This follows since their numerators are equal butdenominators are reciprocals of lengths of the centered anduncentered regressor vectors To illustrate numerators areequal for VIF
119906(1205733) = 03889(13) = 11667 and for
VIF119888(1205733) = 03889(128) = 10889 but denominators are
reciprocals of X10158402X2= 3 and sum
5
119895=1(1198832119895
minus 1198832)2= 28
(119894) 119881perp Reference Model To rectify this malapropism we seek
in Definition 8 a model R119881as Reference This is found on
setting all off-diagonal elements of Z10158400Z0to zero excluding
the first row and column Its inverse Rminus1119881
is listed in Table 6The VF
119907s are ratios of diagonal elements of (Z1015840
0Z0)minus1 on the
left to diagonal elements of Rminus1119881
on the right namely
[VF119907(1205730) VF119907(1205731) VF119907(1205732) VF119907(1205733)]
= [09697 09991 09936 09943]
(25)
Against conventional wisdom it is counter-intuitive that[1205730 1205731 1205732 1205733] are all estimated with greater precision in the
119881perp-nonorthogonal model Z1015840
0Z0than in the 119881
perp-orthogonalmodel R
119881of Definition 8 As in Section 54 this serves again
to refute the tenet that ill-conditioning espouses inflated
Advances in Decision Sciences 11
Table 5 Matrices R119862and Rminus1
119862for checking 119862
perp-orthogonality in the orthogonal data X0 under model1198720
R119862
Rminus1119862
8 10686 25538 17704 01479 minus00170 minus00444 minus0029110686 8 03411 02365 minus00170 01273 0 025538 03411 8 05652 minus00444 0 01392 017704 02365 05652 8 minus00291 0 0 01314
Table 6 Matrices (Z10158400Z0)minus1 and Rminus1
119881for checking 119881
perp-orthogonality for the linked data under model1198720
(Z10158400Z0)minus1 Rminus1
119881
01504 minus00202 minus00461 minus00283 01551 minus00261 minus00530 minus00344minus00202 01293 minus000797 00053 minus00261 01294 00089 00058minus00461 minus00079 01422 minus00054 minus00530 00089 01431 00117minus00283 00053 minus00054 01319 minus00344 00058 00117 01326
variances for models in1198720 that is that VFs necessarily equal
or exceed unity(119894119894) 119862
perp Reference Model Further consider variances werethe Linked Data to be centered As in Definition 12(ii)we seek a Reference model R
119862such that C
119908= (W minus
M) is diagonal The required R119862
and its inverse arelisted in Table 7 Since variances in the Linked Data are[015040 012926 014220 013185] from Table 6 and Refer-ence values appear as diagonal elements of Rminus1
119862on the right
in Table 7 their ratios give the centered VF119888s as
[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]
= [09920 10049 10047 10030]
(26)
We infer that changes in precision would be negligible ifinstead the Linked Data experiment was to be recast as a 119862perp-orthogonal experiment
As in Remark 15 the reference matrix R119862is constrained
by the first and second moments from X10158400X0and has free
parameters 1199091 1199092 1199093 for the (2 3) (2 4) (3 4) entries
The constraints for 119862perp-orthogonality are Rminus1
119862(2 3) =
Rminus1119862(2 4) = Rminus1
119862(3 4) = 0 Although this system of
equations can be solved with numerical software such asMaple the algorithm stated in Definition 12 easily yields thedirect solution given here
62 The Model 119872119908 For the Linked Data take 119884
119894= 12057311198851198941+
12057321198851198942
+ 12057331198851198943
+ 120598119894 1 le 119894 le 8 as Y = Z120573 + 120598
where the expected response at 1198851
= 1198852
= 1198853
= 0
is 119864(119884) = 00 and there is no occasion to translate theregressorsThe lower right (3 times 3) submatrix ofZ1015840
0Z0(4 times 4)
from 1198720is Z1015840Z its inverse is listed in Table 8 The ratios
of diagonal elements of (Z1015840Z)minus1 to reciprocals of diagonalelements of Z1015840Z are the conventional uncentered VIF
119906s
namely [10123 10247 10123] Equivalently scaling Z1015840Z tohave unit diagonals and inverting gives Cminus1 as in Table 8 itsdiagonal elements are the VIF
119906sThroughout Section 6 these
comprise the only correct interpretation of conventionalVIF119906s as genuine variance ratios
7 Case Study Body Fat Data
71 The Setting Body fat and morphogenic measurementswere reported for 119899 = 20 human subjects in Table D11 of[29 p717] Response and regressor variables are119884 amount ofbody fat X
1 triceps skinfold thickness X
2 thigh circumfer-
ence and X3 mid-arm circumference under119872
0 119884119894= 1205730+
12057311198831+12057321198832+12057331198833+ 120598119894 From the original data as reported
the condition number of X10158400X0is 1039931 and the VIF
119906s are
[2714800 4469419 630998 778703] On scaling columnsof X to equal column lengths X
119894 = 50 119894 = 1 2 3 the
resulting condition number of the revisedX10158400X0is 2 9696518
and the uncenteredVIF119906s are as before from invariance under
scalingAgainst complaints that regressors are remote from their
natural origins we center regressors to [0 0 0] as origin onsubtracting the minimal element from each column as inRemark 2 and scaling these to X
119894 = 50 119894 = 1 2 3
This is motivated on grounds that centered diagnostics apartfrom VIF
119888(1205730) are invariant under shifts and scalings from
Lemma 14 In addition under the new origin [0 0 0] 1205730
assumes prominence as the baseline response against whichchanges due to regressors are to be gaged The original dataare available on the publisherrsquos web page for [29]
Taking these shifted and scaled vectors as columnsof X = [X
1X2X3] in the revised X
0= [1
119899X]
values for X10158400X0
and its inverse are listed in Table 9The condition number of X1015840
0X0
is now 1136969 andits VIF
119906s are [6775617998742782174484] Additionally
from Theorem 4(ii) we recover the angle 1205790= 22592 deg to
quantify collinearity of regressors with the constant vectorIn addition the scaled and centered correlationmatrix for
X is
R = [
[
10 00847 08781
00847 10 01424
08781 01424 10
]
]
(27)
indicating strong linkage between mean-centered versions of(X1X3)Moreover the experimental design is neither119881perp nor
119862perp-orthogonal since neither the lower right (3 times 3) block of
X10158400X0nor the centered matrixC = (X1015840Xminus119899XX1015840) is diagonal
12 Advances in Decision Sciences
Table 7 Matrices R119862and Rminus1
119862for checking 119862
perp-orthogonality in the linked data under model1198720
R119862
Rminus1119862
8 13441 27337 17722 01516 minus00216 minus00484 minus0029113441 8 04593 02977 minus00216 01286 0 027337 04593 8 06056 minus00484 0 01415 017722 02977 06056 8 minus00291 0 0 01314
Table 8 Matrices (Z1015840Z)minus1 and Cminus1 for the linked data under model119872119908
(Z1015840Z)minus1 Cminus1
01265 minus00141 00015 10123 minus01125 00123minus00141 01281 minus00141 minus01125 10247 minus0112500015 minus00141 01265 00123 minus01125 10123
(119894) 119881perp Reference Model To compare this with a 119881
perp-or-thogonal design we turn again to Definition 8 Howeverevaluating the test criterion in Lemma 9(119894) shows that thisconfiguration is not commensurate with 119881
perp-orthogonalityso that the VF
119907s are undefined Comparisons with 119862
perp-orthogonal models are addressed next(119894119894) 119862perp Reference Model As in Definition 12(ii) the centering
matrix is
119899XX1015840 = [
[
188889 189402 187499
189402 189916 188007
187499 188007 186118
]
]
(28)
To check against 119862perp-orthogonality we obtain the matrix Wof Definition 12(ii) on replacing off-diagonal elements of thelower right (3 times 3) submatrix ofX1015840
0X0by corresponding off-
diagonal elements of 119899XX1015840 The result is the matrix R119862and
its inverse as given in Table 10 where diagonal elements ofRminus1119862
are the Reference variances The VF119888s found as ratios
of diagonal elements of (X10158400X0)minus1 relative to those of Rminus1
119862
are listed in Table 12 under G = [0 0 0] the indicator ofDefinition 12(i) for this case
72 Variance Factors and Linkage Traditional gages of ill-conditioning are patently absurd on occasion By conventionVIFs are ldquoall or nonerdquo wherein Reference models entailstrictly diagonal components In practice some regressorsare inextricably linked to get or even visualize orthogonalregressors may go beyond feasible experimental rangesrequire extrapolation beyond practical limits and challengecredulity Response functions so constrained are described in[30] as ldquopicket fencesrdquo In such cases taking C to be diagonalas its 119862perp Reference is moot at best an academic folly abettedin turn by default in standard software packages In shortconventional diagnostics here offer answers to irrelevantquestions Given pairs of regressors irrevocably linked weseek instead to assess effects on variances for other pairsthat could be unlinked by design These comments applyespecially to entries under G = [0 0 0] in Table 12
To proceed we define essential ill-conditioning as regres-sors inextricably linked necessarily to remain so and asnonessential if regressors could be unlinked by design As a
followup to Section 32 for X = 0 we infer that the constantvector is inextricably linked with regressors thus accountingfor essential ill-conditioning not removed by centeringcontrary to claims in [4] Indeed this is the essence ofDefinition 8
Fortunately these limitations may be remedied throughReference models adapted to this purpose This in turnexploits the special notation and trappings of Definition 12(i)as follows In addition toG = [0 0 0] we iterate for indicatorsG = [119892
12 11989213 11989223] taking values in [1 0 0] [0 1 0] [0 0 1]
[1 1 0] [1 0 1] [0 1 1] Values of VF119888s thus obtained are
listed in Table 12 To fix ideas the Reference R119862for G =
[1 0 0] that is for the constraint R119862(2 3) = X1015840
0X0(2 3) =
194533 is given in Table 11 together with its inverse Foundas ratios of diagonal elements of (X1015840
0X0)minus1 relative to those of
Rminus1119862
in Table 11 VF119888s are listed in Table 12 underG = [1 0 0]
In short R119862is ldquoas 119862
perp-orthogonal as it could berdquo given theconstraint Other VF
119888s in Table 12 proceed similarly For all
cases in Table 12 excluding G = [0 1 1] 1205730is estimated
with greater efficiency in the original model than any of thepostulated Reference models
As in Remark 15 the reference matrix R119862
for G =
[1 0 0] is constrained by X10158401X2
= 194533 to be retainedwhereas X1015840
1X3
= 1199091and X1015840
2X3
= 1199092are to be determined
from Rminus1119862(2 4) = Rminus1
119862(3 4) = 0 Although this system
can be solved numerically as noted the algorithm stated inDefinition 12 easily yields the direct solution given here
Recall 1198831 triceps skinfold thickness 119883
2 thigh cir-
cumference and 1198833 mid-arm circumference and from
(27) that (1198831 1198833) are strongly linked Invoking the cutoff
rule [24] for VIFs in excess of 40 it is seen for G =
[0 0 0] in Table 12 that VF119888(1205731) and VF
119888(1205733) are excessive
where [X1X2X3] are forced to be mutually uncorrelated
as Reference Remarkably similar values are reported forG isin [1 0 0] [ 0 0 1] [1 0 1] all gaged against intractableReference models wherein (119883
1 1198833) are postulated to be
uncorrelated however incrediblyOn the other hand VF
119888s for G isin [0 1 0] [1 1 0]
[0 1 1] in Table 12 are likewise comparable where (X1X3)
are allowed to remain linked at X10158401X3= 242362 Here the
VF119888s reflect negligible changes in efficiency of the estimates
Advances in Decision Sciences 13
Table 9 Matrices X10158400X0 and (X10158400X0)minus1 for the body fat data under model119872
0centered to minima and scaled to equal lengths
X10158400X0
(X10158400X0)minus1
20 194365 194893 192934 03388 minus01284 minus01482 minus00203194365 25 194533 242362 minus01284 07199 00299 minus06224194893 194533 25 196832 minus01482 00299 01711 minus00494192934 242362 196832 25 minus00203 minus06224 minus00494 06979
Table 10 Matrices R119862and Rminus1
119862for the body fat data adjusted to give off-diagonal elements the value 00 with code G = [0 0 0]
R119862
Rminus1119862
20 194365 194893 192934 05083 minus01590 minus01622 minus01510194365 25 189402 187499 minus01590 01636 0 0194893 189402 25 188007 minus01622 0 01664 0192934 187499 188007 25 minus01510 0 0 01565
in comparison with the originalX10158400X0 where [X
1X2X3] are
all linked In summary negligible changes in overall efficiencywould accrue on recasting the experiment so that (X
1X2)
and (X2X3) were pairwise uncorrelated
Table 12 conveys further useful information on notingthat VFs as ratios are multiplicative and divisible Forexample take the model codedG = [1 1 0] whereX1015840
1X2and
X10158401X3are retained at their initial values under X1015840
0X0 namely
194533 and 242362 from Table 9 with (X2X3) unlinked
Now taking G = [0 1 0] as reference we extract the VFson dividing elements in Table 12 at G = [0 1 0] by those ofG = [1 1 0] The result is
[VF (1205730) VF (120573
1) VF (120573
2) VF (120573
3)]
= [09196
0984810073
1005710282
1020810208
10208]
= [09338 10017 10072 10000]
(29)
8 Conclusions
Our goal is clarity in the actual workings of VIFs as ill-conditioning diagnostics thus to identify and rectifymiscon-ceptions regarding the use of VIFs for models X
0= [1119899X]
with intercept Would-be arbiters for ldquocorrectrdquo usage havedivided between VIF
119888s for mean-centered regressors and
VIF119906s for uncentered data We take issue with both Conven-
tional but clouded vision holds that (i) VIF119906s gage variance
inflation owing to nonorthogonal columns of X0as vectors
in R119899 (ii) VIFs ge 10 and (iii) models having orthogonalmean-centered regressors are ldquoidealrdquo in possessing VIF
119888s of
value unity Reprising Remark 3 these properties widely andcorrectly understood for models in119872
119908 appear to have been
expropriated without verification for models in1198720hence the
anomaliesAccordingly we have distinguished vector-space orthog-
onality (119881perp) of columns of X0 in contrast to orthogonality
(119862perp) of mean-centered columns of X A key feature is theconstruction of second-moment arrays as Reference models(R119881R119862) encoding orthogonalities of these types and their
Variance Factors as VF119907rsquos and VF
119888rsquos respectively Contrary
to convention we demonstrate analytically and through casestudies that (i1015840) VIF
119906s do not gage variance inflation (ii1015840)
VFs ge 10 need not hold whereas VF lt 10 representsVariance Deflation and (iii1015840) models having orthogonalcentered regressors are not necessarily ldquoidealrdquo In particularvariance deflation occurs when ill-conditioned data yieldsmaller variances than corresponding orthogonal surrogates
In short our studies call out to adjudicate the prolongeddispute regarding centered and uncentered diagnostics formodels in119872
0 Our summary conclusions follow
(S1) The choice of a model with or without intercept isa substantive matter in the context of each exper-imental paradigm That choice is fixed beforehandin a particular setting is structural and is beyondthe scope of the present study Snee and Marquardt[10] correctly assert that VIFs are fully creditedand remain undisputed in models without interceptthese quantities unambiguously retain the meaningsoriginally intended namely as ratios of variances togage effects of nonorthogonal regressors For modelswith intercept the choice is between centered anduncentered diagnostics which is to be weighed asfollows
(S2) ConventionalVIF119888s apply incompletely to slopes only
not necessarily grasping fully that a system is illconditioned Of particular concern is missing evi-dence on collinearity of regressors with the interceptOn the other hand if 119862
perp-orthogonality pertainsthen VF
119888s may supplant VIF
119888s as applicable to all of
[1205730 1205731 120573
119896] Moreover the latter may reveal that
VF119888(1205730) lt 10 as in cited examples and of special
import in predicting near the origin of the coordinatesystem
(S3) Our VF119907s capture correctly the concept apparently
intended by conventional VIF119906s Specifically if 119881perp-
orthogonality pertains then VF119907s as genuine ratios
of variances may supplant VIF119906s as now discredited
gages for variance inflation(S4) Returning to Section 32 we subscribe fully but for
different reasons to Belsleyrsquos and othersrsquo contention
14 Advances in Decision Sciences
Table 11 Matrices R119862and Rminus1
119862for the body fat data adjusted to give off-diagonal elements the value 00 except X10158401X2 code G = [1 0 0]
R119862
Rminus1119862
20 194365 194893 192934 04839 minus01465 minus01497 minus01510194365 25 194533 187499 minus01465 01648 minus00141 0194893 194533 25 188007 minus01497 minus00141 01676 0192934 187499 188007 25 minus01510 0 0 01565
Table 12 Summary VF119888s for the body fat data centered to minima and scaled to common lengths identified with varying G = [119892
12 11989213 11989223]
from Definition 12
Elements of G VF119888s
11989212
11989213
11989223
1205730
1205731
1205732
1205733
0 0 0 06665 43996 10282 44586
1 0 0 07002 43681 10208 44586
0 1 0 09196 10073 10282 10208
0 0 1 07201 43996 10073 43681
1 1 0 09848 10057 10208 10208
1 0 1 07595 43681 10003 43681
0 1 1 10248 10073 10073 10160
that uncentered VIF119906s are essential in assessing ill-
conditioned data In the geometry of ill-conditioningthese should be retained in the pantheon of regres-sion diagnostics not as now debunked gages ofvariance inflation but as more compelling angularmeasures of the collinearity of each column of X
0=
[1119899X1X2 X
119896] with the subspace spanned by
remaining columns(S5) Specifically the degree of collinearity of regressors
with the intercept is quantified by 1205790deg which
if small is known to ldquocorrupt the estimates of allparameters in the model whether or not the interceptis itself of interest and whether or not the data havebeen centeredrdquo [21 p90]This in turnwould appear toelevate 120579
0in importance as a critical ill-conditioning
diagnostic(S6) In contrast Theorem 4(iii) identifies conventional
VIF119888s with angles between a centered regressor and
the subspace spanned by remaining centered regres-sors For example
1205791= arccos([
[
1 minus1
VIFc ( ldquo1205731)]
]
12
) (30)
is the angle between the centered vector Z1= B119899X1
and the remaining centered vectors These lie in thelinear subspace of R119899 comprising the orthogonalcomplement to 1
119899isin R119899 and thus bear on the
geometry of ill-conditioning in this subspace(S7) Whereas conventional VIFs are ldquoall or nonerdquo to
be gaged against strictly diagonal components ourReference models enable unlinking what may beunlinked while leaving other pairs of regressorslinked This enables us to assess effects on variances
attributed to allowable partial dependencies amongsome but not other regressors
In conclusion the foregoing summary points to limita-tions in modern software packages Minitab v15 119878119875119878119878 v19and 119878119860119878 v92 are standard software packages which computecollinearity diagnostics returning with the default optionsthe centered VIFs VIF
119888(120573119895) 1 le 119895 le 119896 To compute
the uncentered VIF119906s one needs to define a variable for
the constant term and perform a linear regression using theoptions of fitting without an intercept On the other handcomputations for VF
119888s and VF
119907s as introduced here require
additional if straightforward programming for exampleMaple and PROC IML of the SAS System as utilized here
References
[1] D A Belsley ldquoDemeaning conditioning diagnostics throughcenteringrdquo The American Statistician vol 38 no 2 pp 73ndash771984
[2] E R Mansfield and B P Helms ldquoDetecting multicollinearityrdquoThe American Statistician vol 36 no 3 pp 158ndash160 1982
[3] S D Simon and J P Lesage ldquoThe impact of collinearityinvolving the intercept term on the numerical accuracy ofregressionrdquo Computer Science in Economics and Managementvol 1 no 2 pp 137ndash152 1988
[4] D W Marquardt and R D Snee ldquoRidge regression in practicerdquoThe American Statistician vol 29 no 1 pp 3ndash20 1975
[5] K N Berk ldquoTolerance and condition in regression computa-tionsrdquo Journal of the American Statistical Association vol 72 no360 part 1 pp 863ndash866 1977
[6] A E Beaton D Rubin and J Barone ldquoThe acceptability ofregression solutions another look at computational accuracyrdquoJournal of the American Statistical Association vol 71 no 353pp 158ndash168 1976
Advances in Decision Sciences 15
[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975
[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970
[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987
[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984
[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990
[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980
[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969
[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996
[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003
[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006
[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008
[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010
[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981
[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983
[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987
[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986
[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980
[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007
[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984
[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969
[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997
[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000
[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003
[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983
6 Advances in Decision Sciences
this occurrence Specifically the smaller the angle the greaterthe extent of such collinearity
Instead of conventional VIF119906s given the foregoing dis-
claimer we have the following amended version as Referencefor uncentered diagnostics altering whatmay be changed butleaving X = 0 intact
Definition 8 Given a model in 1198720with second-moment
matrixX10158400X0 the amendedReferencemodel for assessing119881perp-
orthogonality is R119881
= [ 119899 119899X1015840
119899X D] with D = Diag(119889
11 119889
119896119896)
as diagonal elements of X1015840X provided that R119881is positive
definite We identify a model to be 119881perp-orthogonal when
X10158400X0= R119881
As anticipated a prospective R119881
fails to conform toexperimental data if not positive definite These and furtherprospects are covered in the following where 120601
119894designates
the angle between (1119899X119894) 119894 = 1 119896
Lemma 9 Take R119881
as a prospective Reference for 119881perp-
orthogonality
(i) In order that R119881maybe positive definite it is necessary
that 119899X1015840Dminus1X lt 1 that is that 119899sum119896
119894=1(1198832
119894119889119894119894) lt 1
(ii) Equivalently it is necessary thatsum119896119894=1
cos2(120601119894) lt 1with
120601119894as the angle between (1
119899Xi) 119894 = 1 119896
(iii) The Reference variance for 1205730is Var(120573
0| X1015840X = D) =
[119899(1 minus 119899X1015840Dminus1X)]minus1
(iv) The Reference variances for slopes are given by
Var (120573119894| X1015840X = D) =
1
119889119894119894
[1 +1205741198832
119894
119889119894119894
] 1 le 119894 le 119896 (11)
where 120574 = 119899(1 minus 119899X1015840Dminus1X)
Proof The clockwise rule for determinants gives |R119881| =
|D|[119899 minus 1198992X1015840Dminus1X] Conclusion (i) follows since |D| gt 0
The computation
cos (120601119894) =
11015840119899X119894
1003817100381710038171003817111989910038171003817100381710038171003817100381710038171003817X119894
1003817100381710038171003817=
11015840119899X119894
radic119899119889119894119894
= radic1198991198832
119894
119889119894119894
(12)
in parallel with (10) gives conclusion (ii) Using the clockwiserule for block-partitioned inverses the (1 1) element of Rminus1
119881
is given by conclusion (iii) Similarly the lower right block ofRminus1119881 of order (119896 times 119896) is the inverse of H = D minus 119899XX1015840 On
identifying a = b = X 120572 = 119899 and 119886lowast
119894= 119886119894119889119894119894 in Theorem
833 of [26] we have that Hminus1 = Dminus1 + 120574Dminus1XX1015840Dminus1Conclusion (iv) follows on extracting its diagonal elementsto complete our proof
Corollary 10 For the case 119896 = 2 in order that R119881maybe
positive definite it is necessary that 1206011+ 1206012gt 90 deg
Proof Beginning with Lemma 9(ii) compute
cos (1206011+ 1206012)
= cos1206011cos1206012minus sin120601
1sin1206012
= cos1206011cos1206012minus[1minus(cos2120601
1+cos2120601
2)+cos2120601
1cos21206012]12
(13)
which is lt0 when 1 minus (cos21206011+ cos2120601
2) gt 0
Moreover the matrix R119881itself is intrinsically ill conditioned
owing to X = 0 its condition number depending on X Toquantify this dependence we have the following wherecolumns of X have been standardized to common lengths|X119894| = 119888 1 le 119894 le 119896
Lemma 11 Let R119881= [ 119899 T1015840
T 119888I119896
] as in Definition 8 with T = 119899XD = 119888I
119896 and 119888 gt 0
(i) The ordered eigenvalues of R119881are [120582
1 119888 119888 120582
119901]
where 119901 = 119896 + 1 and (1205821 120582119901) are the roots of 120582 isin
[(119899 + 119888) plusmn radic(119899 minus 119888)2+ 41205912]2 and 120591
2= T1015840T
(ii) The roots are positive andR119881is positive definite if and
only if (119899119888 minus 1205912) gt 0
(iii) If R119881
is positive definite its condition number is1198881(R119881) = 1205821120582119901and is increasing in 120591
2
Proof Eigenvalues are roots of the determinantal equation
Det [(119899 minus 120582) T1015840T (119888 minus 120582) I
119896
]
= 0 lArrrArr (119888 minus 120582)119896minus1
[(119899 minus 120582) (119888 minus 120582) minus T1015840T] = 0
(14)
from the clockwise rule giving (119896 minus 1) values 120582 = 119888 and tworoots of the quadratic equation 120582
2minus (119899 + 119888)120582 + 119899119888 minus 120591
2= 0
to give conclusion (i) Conclusion (ii) holds since the productof roots of the quadratic equation is (119899119888 minus 120591
2) and the greater
root is positive Conclusion (iii) follows directly to completeour proof
(119894119894) 119862perp Reference Model As noted the notion of 119862
perp-or-thogonality applies exclusively for models in 119872
0 Accord-
ingly as Reference we seek to alter X1015840X so that the matrixcomprising sums of squares and sums of products of devia-tions from means thus altered is diagonal To achieve thiscanon of 119862
perp-orthogonality and to anticipate notation forsubsequent use we have the following
Definition 12 (i) For a model in 1198720with second-moment
matrix X10158400X0and inverse as in (6) let C
119908= (W minus M) and
identify an indicator vector G = [11989212 11989213 119892
1119896 11989223
1198922119896 119892
119896minus1119896] of order ((119896 minus 1) times (119896 minus 2)) in lexicographic
order where 119892119894119895= 0 119894 = 119895 if the (119894 119895) element of C
119908is zero
and 119892119894119895= 1 119894 = 119895 if the (119894 119895) element ofC
119908is (X1015840119894X119895minus119899119883119894119883119895)
that is unchanged from C = (X1015840X minus 119899XX1015840) = Z1015840Z withZ = B
119899X as the regressors centered to their means
Advances in Decision Sciences 7
(ii) In particular the Reference model in1198720for assessing
119862perp-orthogonality is R
119862= [ 119899 119899X
1015840
119899X W] such that C
119908= (W minus
M) and its inverse G from (6) are diagonal that is takingW = [119908
119894119895] such that 119908
119894119894= 119889119894119894 1 le 119894 le 119896 and 119908
119894119895=
119898119894119895 for all 119894 = 119895 or equivalently G = [0 0 0] In this
case we identify the model to be 119862perp-orthogonal
Recall that conventional VIF119888s for [120573
1 1205732 120573
119896] are
ratios of diagonal elements of (Z1015840Z)minus1 to reciprocals ofthe diagonal elements of the centered Z1015840Z Apparently thisrests on the logic that 119862perp-orthogonality in 119872
0implies that
Z1015840Z is diagonal However the converse fails and instead isembodied in R
119862isin 1198720 In consequence conventional VIF
119888s
are deficient in applying only to slopes whereas the VF119888s
resulting from Definition 12(ii) apply informatively for all of[1205730 1205731 1205732 120573
119896]
Definition 13 Designate VF119907(sdot) and VF
119888(sdot) as Variance Fac-
tors resulting from Reference models of Definition 8 andDefinition 12(ii) respectively On occasion these representVariance Deflation in addition to VIFs
Essential invariance properties pertain ConventionalVIF119888s are scale invariant see Section 3 of [9]We see next that
they are translation invariant as well To these ends we shiftthe columns of X=[X
1X2 X
119896] to a new origin through
X rarr X minus L = H where L = [11988811119899 11988821119899 119888
1198961119899] The
resulting model is
Y = (12057301119899+ L120573) +H120573 + 120598
lArrrArr Y = (1205730+ 120595) 1
119899+H120573 + 120598
(15)
thus preserving slopes where120595 = c1015840120573=(11988811205731+11988821205732+sdot sdot sdot+119888
119896120573119896)
Corresponding to X0= [1119899X] is U
0= [1119899H] and to X1015840
0X0
and its inverse are
U10158400U0= [
119899 11015840119899H
H10158401119899
H1015840H] (U10158400U0)minus1
= [11988700
b01
b101584001
G ] (16)
in the form of (6)This pertains to subsequent developmentsand basic invariance results emerge as follows
Lemma 14 Consider Y = 12057301119899+ X120573 + 120598 together with the
shifted version Y = (1205730+ 120595)1
119899+ H120573 + 120576 both in 119872
0 Then
the matrices G appearing in (6) and (16) are identical
Proof Rules for block-partitioned inverses again assert thatG of (16) is the inverse of
(H1015840H minus 119899minus1H10158401
11989911015840119899H) = H1015840 (I
119899minus 119899minus1111989911015840119899)H
= H1015840B119899H = X1015840B
119899X
(17)
since B119899L = 0 to complete our proof
These facts in turn support subsequent claims that cen-tered VIF
119888s are translation and scale invariant for slopes
1015840
=
[1205731 1205732 120573
119896] apart from the intercept 120573
0
43 A Critique Again we distinguish VIF119888s and VIF
119906s from
centered and uncentered regressorsThe following commentsapply
(C1) A design is either 119881perp or 119862perp-orthogonal respectivelyaccording as the lower right (119896times119896) blockX1015840X ofX1015840
0X0
orC = (X1015840Xminus119899XX1015840) from expression (6) is diagonalOrthogonalities of type 119881perp and 119862
perp are exclusive andhence work at crossed purposes
(C2) In particular 119862perp-orthogonality holds if and only ifthe columns of X are 119881
perp-nonorthogonal If X is 119862perp-orthogonal then for 119894 = 119895 C
119908(119894119895) = X1015840
119894X119895minus 119899119883119894119883119895=
0 and X1015840119894X119895
= 0 Conversely if X indeed is 119881perp-
orthogonal so that X1015840X = D= Diag(11988911 119889
119896119896)
then (D minus 119899XX1015840) cannot be diagonal as Reference inwhich case the conventional VIF
119888s are tenuous
(C3) Conventional VIF119906s based on uncentered X in 119872
0
with X = 0 do not gage variance inflation as claimedfounded on the false tenet that X1015840
0X0can be diagonal
(C4) To detect influential observations and to classify highleverage points casendashinfluence diagnostics namely119891119895119868
= VIF119895(minus119868)
VIF119895 1 le 119895 le 119896 are studied in [27]
for assessing the impact of subsets on variance infla-tion Here VIF
119895is from the full data and VIF
119895(minus119868)on
deleting observations in the index set 119868 Similarly [28]proposes using VIF
(119894)on deleting the 119894th observation
In the present context these would gain substance onmodifying VF
119907and VF
119888accordingly
5 Case Study 1 Continued
51 The Setting We continue an elementary and transparentexample to illustrate essentials Recall119872
0 119884119894= 1205730+12057311198831+
12057321198832+120598119894 of Section 22 the designX
0= [15X1X2] of order
(5 times 3) and U = X10158400X0 and its inverse V = (X10158400X0)minus1 as in
expressions (1) The design is neither 119881perp nor 119862perp orthogonalsince neither the lower right (2 times 2) block of X1015840
0X0nor the
centered Z1015840Z is diagonal Moreover the uncentered VIF119906s as
listed in (3) are not the vaunted relative increases in variancesowing to nonorthogonal columns of X
0 Indeed the only
opportunity for 119881perp-orthogonality here is between columns
[X1X2]
Nonetheless from Section 41 we utilize Theorem 4(i)and (10) to connect the collinearity indices of [9] to anglesbetween subspaces Specifically Minitab recovered valuesfor the residuals RS
119895= x
119895minus P119895x1198952 119895 = 0 1 2 fur-
ther computations proceed routinely as listed in Table 1 Inparticular the principal angle 120579
0between the constant vector
and the span of the regressor vectors is 1205790= 31751 deg as
anticipated in Remark 7
52 119881perp-Orthogonality For subsequent reference let 119863(119909) bea design as in (1) but with second-moment matrix
U (119909) = [
[
5 3 1
3 25 119909
1 119909 3
]
]
(18)
8 Advances in Decision Sciences
Table 1 Values for residuals RS119895= x119895minus P119895x1198952 for X
1198952 for 1205812
119906119895= VI F
119906(120573119895) for (1 minus 1120581
2
119906119895)12 and for 120579
119895 for 119895 = 0 1 2
119895 RS119895
X1198952
1205812
119895(1 minus 1120581
2
119906119895)12
120579119895(deg)
0 13846 50 36111 08503 317511 06429 25 38889 08619 304702 25714 30 11667 03780 67792
Invoking Definition 8 we seek a 119881perp-orthogonal Reference
with moment matrix U(0) having X10158401X2
= 0 that is 119863(0)This is found constructively on retaining [1
5 ∙X2] in X
0
but replacing X1by X01
= [1 05 05 1 0]1015840 as listed in
Table 2 giving the design 119863(0) with columns [15X01X2] as
ReferenceAccordingly R
119881= U(0) is ldquoas 119881
perp-orthogonal as it cangetrdquo given the lengths and sums of (X
1X2) as prescribed
in the experiment and as preserved in the Reference modelLemma 9 with X=[06 02]
1015840 and D= Diag(25 3) givesX1015840Dminus1X = 0157333 and 120574 = 5(1 minus 5X1015840Dminus1X) =
234375 Applications of Lemma 9(iii)-(iv) in succession givethe reference variances
Var (1205730| X1015840X = D) = [5 (1 minus 5X1015840Dminus1X)]
minus1
= 09375
Var (1205731|X1015840X=D)=
1
11988911
[1+1205741198832
1
11988911
]=2
5[1+
072120574
5]=17500
Var (1205732|X1015840X=D)=
1
11988922
[1+1205741198832
2
11988922
]=1
3[1+
004120574
3]=04375
(19)
As Reference these combine with actual variances from119881 = [119881
119894119895] at (1) giving VF
119907for the original designX
0relative
to R119881 For example VF
119907(1205730) = 0722209375 = 07704 with
11988111
= 07222 from (1) and 09375 from (19) This contrastswith VIF
119906(1205730) = 36111 in (3) as in Section 22 and Table 2
it is noteworthy that the nonorthogonal design 119863(1) yieldsuniformly smaller variances than the 119881
perp-orthogonal R119881
namely119863(0)Further versions 119863(119909) 119909 isin [minus05 00 05 06 10 15]
of our basic design are listed for comparison in Table 2 where119909 = X1015840
1X2for the various designs The designs themselves are
exhibited on transposing rowsX10158401= [11988311 11988312 11988313 11988314 11988315]
from Table 2 and substituting each into the template X00
=
[15 ∙X2] The design 119863(2) was constructed but not listed
since its X10158400X0is not invertible Clearly these are actual
designs amenable to experimental deployment
53 119862perp-Orthogonality Continuing X0
= [15X1X2] and
invoking Definition 12(ii) we seek a 119862perp-orthogonal
Reference having the matrix C119908
as diagonal This is
identified as 119863(06) in Table 2 From this the matrix R119862and
its inverse are
R119862= [
[
5 3 1
3 25 06
1 06 3
]
]
Rminus1119862
= [
[
07286 minus08572 minus00714
minus08571 14286 00
minus00714 00 03571
]
]
(20)
The variance factors are listed in (4) where for exampleVF119888(1205732) = 0388903571 = 10889 As distinct from conven-
tional VIF119888s for 120573
1and 120573
2only our VF
119888(1205730) here reflects
Variance Deflationwherein 1205730is estimated more precisely in
the initial 119862perp-nonorthogonal designA further note on orthogonality is germane Suppose
that the actual experiment is 119863(0) yet towards a thoroughdiagnosis the user evaluates the conventional VIF
119888(1205731) =
12250 = VIF119888(1205732) as ratios of variances of 119863(0) to 119863(06)
from Table 2 Unfortunately their meaning is obscured sincea design cannot at once be 119881perp and 119862
perp-orthogonalAs in Remark 2 we next alter the basic designX
0at (1) on
shifting columns of themeasurements to [0 0] asminima andscaling to have squared lengths equal to 119899 = 5 The resultingX0follows directly from (1) The new matrix U = X
1015840
0X0 andits inverse V are
U = [
[
5 42426 42426
42426 5 4
42426 4 5
]
]
V = [
[
1 minus04714 minus04714
minus04714 07778 minus02222
minus04714 minus02222 07778
]
]
(21)
giving conventional VIF119906s as [5 38889 38889] Against 119862perp-
orthogonality this gives the diagnostics
[VF119888(1205730) VF119888(1205731) VF119888(1205732)] = [08140 10889 10889]
(22)
demonstrating in comparison with (4) that VF119888s apart from
1205730 are invariant under translation and scaling a consequence
of Lemma 14We further seek to compare variances for the shifted
and scaled (X1X2) against 119881perp-orthogonality as Reference
However from U = X10158400X0at (21) we determine that 119883
1=
084853 = 1198832and 5[(119883
2
15) + (119883
2
25)] = 14400 Since
the latter exceeds unity we infer from Lemma 9(i) that
Advances in Decision Sciences 9
Table 2 Designs 119863(119909) 119909 isin [minus05 00 05 06 10 15] obtained by substitutingX01= [11988311 11988312 11988313 11988314 11988315]1015840 in [15 ∙X2] of order (5times3)
and corresponding variances Var (1205730)Var (120573
1)Var (120573
2)
Design 11988311
11988312
11988313
11988314
11988315
Var (1205730) Var (120573
1) Var (120573
2)
119863(minus05) 10 00 05 10 05 19333 37333 09333119863(00) 10 05 05 10 00 09375 17500 04375119863(05) 05 10 00 10 05 07436 14359 03590119863(06) 06091 05793 06298 11819 0000 07286 14286 03570119863(10) 00 05 05 10 10 07222 15556 03889119863(15) 00 10 05 05 10 09130 24348 06087
119881perp-orthogonality is incompatible with this configuration
of the data so that the VF119907s are undefined Equivalently
on evaluating 1206011
= 1206012
= 2290 deg and 1206011+ 1206012
=
4580 lt 90 deg Corollary 10 asserts that R119881is not positive
definite This appears to be anomalous until we recallthat 119881perp-orthogonality is invariant under rescaling but notrecentering the regressors
54 A Critique We reiterate apparent ambiguity ascribedin the literature to orthogonality A number of widely heldguiding precepts has evolved as enumerated in part in ouropening paragraph and in Section 3 As in Remark 3 theseclearly have been taken to apply verbatim forX
0andX1015840
0X0in
1198720 to be paraphrased in part as follows(P1) Ill-conditioning espouses inflated variances that is
VIFs necessarily equal or exceed unity(P2) 119862perp-orthogonal designs are ldquoidealrdquo in that VIF
119888s for
such designs are all unity see [11] for exampleWe next reassess these precepts as they play out under119881perp and 119862
perp orthogonality in connection with Table 2(C1) For models in 119872
0having X = 0 we reiterate that
the uncentered VIF119906s at expression (3) namely
[36111 38889 11667] overstate adverse effects onvariances of the 119881perp-nonorthogonal array since X1015840
0X0
cannot be diagonal Moreover these conventionalvalues fail to discern that the revised values for VF
119907s
at (5) namely [07704 08889 08889] reflect that[1205730 1205731 1205732] are estimated with greater efficiency in the
119881perp-nonorthogonal array119863(1)
(C2) For 119881perp-orthogonality the claim P1 thus is false by
counterexampleTheVFs at (5) areVarianceDeflationFactors for design 119863(1) where X1015840
1X2= 1 relative to
the 119881perp-orthogonal119863(0)(C3) Additionally the variances Var[120573
0(119909)] Var[120573
1(119909)]
Var[1205732(119909)] in Table 2 are all seen to decrease from
119863(0) [09375 17500 04375] rarr 119863(1) [0722215556 03889] despite the transition from the 119881
perp-orthogonal 119863(0) to the 119881
perp-nonorthogonal 119863(1)Similar trends are confirmed in other ill-conditioneddata sets from the literature
(C4) For 119862perp-orthogonal designs the claim P2 is false by
counterexample Such designs need not be ldquoidealrdquoas 1205730may be estimated more efficiently in a 119862
perp-nonorthogonal design as demonstrated by the VF
119888s
for 1205730at (4) and (22) Similar trends are confirmed
elsewhere as seen subsequently
(C5) VFs for 1205730are critical in prediction where prediction
variances necessarily depend on Var(1205730) especially
for predicting near the origin of the system of coor-dinates
(C6) Dissonance between 119881perp and 119862
perp is seen in Table 2where 119863(06) as 119862
perp-orthogonal with X10158401X2
= 06is the antithesis of 119881perp-orthogonality at 119863(0) whereX10158401X2= 0
(C7) In short these transparent examples serve to dispelthe decades-old mantra that ill-conditioning neces-sarily spawns inflated variances formodels in119872
0 and
they serve to illuminate the contributing structures
6 Orthogonal and Linked Arrays
A genuine 119881perp-orthogonal array was generated as eigenvec-
tors E = [E1E2 E
8] from a positive definite (8 times 8)
matrix (see Table 83 of [11 p377]) using PROC IML of theSAS programming package The columns [X
1X2X3] as the
second through fourth columns of Table 3 comprise the firstthree eigenvectors scaled to length 8 These apply for modelsin 119872119908and 119872
0to be analyzed In addition linked vectors
(Z1Z2Z3) were constructed as Z
1= 119888(09E
1+ 01E
2)
Z2
= 119888(09E2+ 01E
3) and Z
3= 119888(09E
3+ 01E
4) where
119888 = radic8082 Clearly these arrays as listed in Table 3 are notarchaic abstractions as both are amenable to experimentalimplementation
61 The Model 1198720 We consider in turn the orthogonal and
the linked series
611 Orthogonal Data Matrices X10158400X0and (X1015840
0X0)minus1 for the
orthogonal data under model1198720are listed in Table 4 where
variances occupy diagonals of (X10158400X0)minus1 The conventional
uncentered VIF119906s are [120296 102144 112256 105888]
Since X = 0 we find the angle between the constant andthe span of the regressor vectors to be 120579
0= 65748 deg as
in Theorem 4(ii) Moreover the angle between X1and the
span of [1119899X2X3] namely 120579
1= 81670 deg is not 90 deg
because of collinearity with the constant vector despite themutual orthogonality of [X
1X2X3] Observe here thatX1015840
0X0
10 Advances in Decision Sciences
Table 3 Basic 119881perp-orthogonal design X0 = [18 X1 X2 X3] and a linked design Z0 = [18 Z1 Z2 Z3] of order (8 times 4)
Orthogonal (X1X2X3) Linked (Z
1Z2Z3)
18
X1
X2
X3
18
Z1
Z2
Z3
1 minus1084470 0056899 0541778 1 minus1071550 0116381 06534861 minus1090010 minus0040490 0117026 1 minus1087820 minus0027320 01160861 0979050 0002564 2337454 1 0973345 0260677 21996871 minus1090230 0016839 0170711 1 minus1081700 0035588 00706941 0127352 2821941 minus0071670 1 0438204 2796767 minus00707101 1045409 minus0122670 minus1476870 1 1025468 minus0285010 minus16185601 1090803 minus0088970 0043354 1 1074306 minus0083640 01769281 1090739 minus0092300 0108637 1 1073875 minus0079740 0244567
Table 4 Matrices X10158400X0 and (X10158400X0)minus1 for the orthogonal data under model119872
0
X10158400X0
(X10158400X0)minus1
8 10686 25538 17704 01504 minus00201 minus00480 minus0033310686 8 0 0 minus00201 01277 00064 0004425538 0 8 0 minus00480 00064 01403 0010617704 0 0 8 minus00333 00044 00106 01324
already is 119881perp-orthogonal accordingly R119881= X10158400X0 and thus
the VF119907s are all unity
In view of dissonance between 119881perp and 119862
perp orthogonalitythe sums of squares and products for the mean-centered Xare
X1015840B119899X = [
[
78572 minus03411 minus02365
minus03411 71847 minus05652
minus02365 minus05652 76082
]
]
(23)
reflecting nonnegligible negative dependencies to distin-guish the 119881
perp-orthogonal X from the 119862perp-nonorthogonal
matrix Z = B119899X of deviations
To continue we suppose instead that the 119881perp-orthogonalmodel was to be recast as 119862
perp-orthogonal The Referencemodel and its inverse are listed in Table 5 Variance factorsas ratios of diagonal elements of (X1015840
0X0)minus1 in Table 4 to those
of Rminus1119862
in Table 5 are
[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]
= [10168 10032 10082 10070]
(24)
reflecting negligible differences in precision This parallelsresults reported in Table 2 comparing 119863(0) to 119863(06) asReference except for larger differences in Table 2 namely[128681225012250]
612 Linked Data For the Linked Data under model 1198720
the matrix Z10158400Z0 follows routinely from Table 3 its inverse(Z10158400Z0)minus1 is listed in Table 6 The conventional uncentered
VIF119906s are now [120320 103408 113760 105480] These
again fail to gage variance inflation since Z10158400Z0cannot be
diagonal From (10) the principal angle between the constantvector and the span of the regressor vectors is 120579
0= 65735
degrees
Remark 15 Observe that the choice for R119862may be posed as
seeking R119862
= U(119909) such that Rminus1119862(2 3) = 00 then solving
numerically for 119909 using Maple for example However thealgorithm in Definition 12(ii) affords a direct solution thatC119908should be diagonal stipulates its off-diagonal element as
X10158401X2minus 35 = 0 so that X1015840
1X2= 06 in R
119862at expression (20)
To illustrate Theorem 4(iii) we compute cos(120579) = (1 minus
110889)12 = 02857 and 120579 = 73398 deg as the angle between
the vectors [X1X2] of (1) when centered to their means For
slopes in 1198720 [9] shows that VIF
119888(120573119895) lt VIF
119906(120573119895) 1 le
119894 le 119896 This follows since their numerators are equal butdenominators are reciprocals of lengths of the centered anduncentered regressor vectors To illustrate numerators areequal for VIF
119906(1205733) = 03889(13) = 11667 and for
VIF119888(1205733) = 03889(128) = 10889 but denominators are
reciprocals of X10158402X2= 3 and sum
5
119895=1(1198832119895
minus 1198832)2= 28
(119894) 119881perp Reference Model To rectify this malapropism we seek
in Definition 8 a model R119881as Reference This is found on
setting all off-diagonal elements of Z10158400Z0to zero excluding
the first row and column Its inverse Rminus1119881
is listed in Table 6The VF
119907s are ratios of diagonal elements of (Z1015840
0Z0)minus1 on the
left to diagonal elements of Rminus1119881
on the right namely
[VF119907(1205730) VF119907(1205731) VF119907(1205732) VF119907(1205733)]
= [09697 09991 09936 09943]
(25)
Against conventional wisdom it is counter-intuitive that[1205730 1205731 1205732 1205733] are all estimated with greater precision in the
119881perp-nonorthogonal model Z1015840
0Z0than in the 119881
perp-orthogonalmodel R
119881of Definition 8 As in Section 54 this serves again
to refute the tenet that ill-conditioning espouses inflated
Advances in Decision Sciences 11
Table 5 Matrices R119862and Rminus1
119862for checking 119862
perp-orthogonality in the orthogonal data X0 under model1198720
R119862
Rminus1119862
8 10686 25538 17704 01479 minus00170 minus00444 minus0029110686 8 03411 02365 minus00170 01273 0 025538 03411 8 05652 minus00444 0 01392 017704 02365 05652 8 minus00291 0 0 01314
Table 6 Matrices (Z10158400Z0)minus1 and Rminus1
119881for checking 119881
perp-orthogonality for the linked data under model1198720
(Z10158400Z0)minus1 Rminus1
119881
01504 minus00202 minus00461 minus00283 01551 minus00261 minus00530 minus00344minus00202 01293 minus000797 00053 minus00261 01294 00089 00058minus00461 minus00079 01422 minus00054 minus00530 00089 01431 00117minus00283 00053 minus00054 01319 minus00344 00058 00117 01326
variances for models in1198720 that is that VFs necessarily equal
or exceed unity(119894119894) 119862
perp Reference Model Further consider variances werethe Linked Data to be centered As in Definition 12(ii)we seek a Reference model R
119862such that C
119908= (W minus
M) is diagonal The required R119862
and its inverse arelisted in Table 7 Since variances in the Linked Data are[015040 012926 014220 013185] from Table 6 and Refer-ence values appear as diagonal elements of Rminus1
119862on the right
in Table 7 their ratios give the centered VF119888s as
[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]
= [09920 10049 10047 10030]
(26)
We infer that changes in precision would be negligible ifinstead the Linked Data experiment was to be recast as a 119862perp-orthogonal experiment
As in Remark 15 the reference matrix R119862is constrained
by the first and second moments from X10158400X0and has free
parameters 1199091 1199092 1199093 for the (2 3) (2 4) (3 4) entries
The constraints for 119862perp-orthogonality are Rminus1
119862(2 3) =
Rminus1119862(2 4) = Rminus1
119862(3 4) = 0 Although this system of
equations can be solved with numerical software such asMaple the algorithm stated in Definition 12 easily yields thedirect solution given here
62 The Model 119872119908 For the Linked Data take 119884
119894= 12057311198851198941+
12057321198851198942
+ 12057331198851198943
+ 120598119894 1 le 119894 le 8 as Y = Z120573 + 120598
where the expected response at 1198851
= 1198852
= 1198853
= 0
is 119864(119884) = 00 and there is no occasion to translate theregressorsThe lower right (3 times 3) submatrix ofZ1015840
0Z0(4 times 4)
from 1198720is Z1015840Z its inverse is listed in Table 8 The ratios
of diagonal elements of (Z1015840Z)minus1 to reciprocals of diagonalelements of Z1015840Z are the conventional uncentered VIF
119906s
namely [10123 10247 10123] Equivalently scaling Z1015840Z tohave unit diagonals and inverting gives Cminus1 as in Table 8 itsdiagonal elements are the VIF
119906sThroughout Section 6 these
comprise the only correct interpretation of conventionalVIF119906s as genuine variance ratios
7 Case Study Body Fat Data
71 The Setting Body fat and morphogenic measurementswere reported for 119899 = 20 human subjects in Table D11 of[29 p717] Response and regressor variables are119884 amount ofbody fat X
1 triceps skinfold thickness X
2 thigh circumfer-
ence and X3 mid-arm circumference under119872
0 119884119894= 1205730+
12057311198831+12057321198832+12057331198833+ 120598119894 From the original data as reported
the condition number of X10158400X0is 1039931 and the VIF
119906s are
[2714800 4469419 630998 778703] On scaling columnsof X to equal column lengths X
119894 = 50 119894 = 1 2 3 the
resulting condition number of the revisedX10158400X0is 2 9696518
and the uncenteredVIF119906s are as before from invariance under
scalingAgainst complaints that regressors are remote from their
natural origins we center regressors to [0 0 0] as origin onsubtracting the minimal element from each column as inRemark 2 and scaling these to X
119894 = 50 119894 = 1 2 3
This is motivated on grounds that centered diagnostics apartfrom VIF
119888(1205730) are invariant under shifts and scalings from
Lemma 14 In addition under the new origin [0 0 0] 1205730
assumes prominence as the baseline response against whichchanges due to regressors are to be gaged The original dataare available on the publisherrsquos web page for [29]
Taking these shifted and scaled vectors as columnsof X = [X
1X2X3] in the revised X
0= [1
119899X]
values for X10158400X0
and its inverse are listed in Table 9The condition number of X1015840
0X0
is now 1136969 andits VIF
119906s are [6775617998742782174484] Additionally
from Theorem 4(ii) we recover the angle 1205790= 22592 deg to
quantify collinearity of regressors with the constant vectorIn addition the scaled and centered correlationmatrix for
X is
R = [
[
10 00847 08781
00847 10 01424
08781 01424 10
]
]
(27)
indicating strong linkage between mean-centered versions of(X1X3)Moreover the experimental design is neither119881perp nor
119862perp-orthogonal since neither the lower right (3 times 3) block of
X10158400X0nor the centered matrixC = (X1015840Xminus119899XX1015840) is diagonal
12 Advances in Decision Sciences
Table 7 Matrices R119862and Rminus1
119862for checking 119862
perp-orthogonality in the linked data under model1198720
R119862
Rminus1119862
8 13441 27337 17722 01516 minus00216 minus00484 minus0029113441 8 04593 02977 minus00216 01286 0 027337 04593 8 06056 minus00484 0 01415 017722 02977 06056 8 minus00291 0 0 01314
Table 8 Matrices (Z1015840Z)minus1 and Cminus1 for the linked data under model119872119908
(Z1015840Z)minus1 Cminus1
01265 minus00141 00015 10123 minus01125 00123minus00141 01281 minus00141 minus01125 10247 minus0112500015 minus00141 01265 00123 minus01125 10123
(119894) 119881perp Reference Model To compare this with a 119881
perp-or-thogonal design we turn again to Definition 8 Howeverevaluating the test criterion in Lemma 9(119894) shows that thisconfiguration is not commensurate with 119881
perp-orthogonalityso that the VF
119907s are undefined Comparisons with 119862
perp-orthogonal models are addressed next(119894119894) 119862perp Reference Model As in Definition 12(ii) the centering
matrix is
119899XX1015840 = [
[
188889 189402 187499
189402 189916 188007
187499 188007 186118
]
]
(28)
To check against 119862perp-orthogonality we obtain the matrix Wof Definition 12(ii) on replacing off-diagonal elements of thelower right (3 times 3) submatrix ofX1015840
0X0by corresponding off-
diagonal elements of 119899XX1015840 The result is the matrix R119862and
its inverse as given in Table 10 where diagonal elements ofRminus1119862
are the Reference variances The VF119888s found as ratios
of diagonal elements of (X10158400X0)minus1 relative to those of Rminus1
119862
are listed in Table 12 under G = [0 0 0] the indicator ofDefinition 12(i) for this case
72 Variance Factors and Linkage Traditional gages of ill-conditioning are patently absurd on occasion By conventionVIFs are ldquoall or nonerdquo wherein Reference models entailstrictly diagonal components In practice some regressorsare inextricably linked to get or even visualize orthogonalregressors may go beyond feasible experimental rangesrequire extrapolation beyond practical limits and challengecredulity Response functions so constrained are described in[30] as ldquopicket fencesrdquo In such cases taking C to be diagonalas its 119862perp Reference is moot at best an academic folly abettedin turn by default in standard software packages In shortconventional diagnostics here offer answers to irrelevantquestions Given pairs of regressors irrevocably linked weseek instead to assess effects on variances for other pairsthat could be unlinked by design These comments applyespecially to entries under G = [0 0 0] in Table 12
To proceed we define essential ill-conditioning as regres-sors inextricably linked necessarily to remain so and asnonessential if regressors could be unlinked by design As a
followup to Section 32 for X = 0 we infer that the constantvector is inextricably linked with regressors thus accountingfor essential ill-conditioning not removed by centeringcontrary to claims in [4] Indeed this is the essence ofDefinition 8
Fortunately these limitations may be remedied throughReference models adapted to this purpose This in turnexploits the special notation and trappings of Definition 12(i)as follows In addition toG = [0 0 0] we iterate for indicatorsG = [119892
12 11989213 11989223] taking values in [1 0 0] [0 1 0] [0 0 1]
[1 1 0] [1 0 1] [0 1 1] Values of VF119888s thus obtained are
listed in Table 12 To fix ideas the Reference R119862for G =
[1 0 0] that is for the constraint R119862(2 3) = X1015840
0X0(2 3) =
194533 is given in Table 11 together with its inverse Foundas ratios of diagonal elements of (X1015840
0X0)minus1 relative to those of
Rminus1119862
in Table 11 VF119888s are listed in Table 12 underG = [1 0 0]
In short R119862is ldquoas 119862
perp-orthogonal as it could berdquo given theconstraint Other VF
119888s in Table 12 proceed similarly For all
cases in Table 12 excluding G = [0 1 1] 1205730is estimated
with greater efficiency in the original model than any of thepostulated Reference models
As in Remark 15 the reference matrix R119862
for G =
[1 0 0] is constrained by X10158401X2
= 194533 to be retainedwhereas X1015840
1X3
= 1199091and X1015840
2X3
= 1199092are to be determined
from Rminus1119862(2 4) = Rminus1
119862(3 4) = 0 Although this system
can be solved numerically as noted the algorithm stated inDefinition 12 easily yields the direct solution given here
Recall 1198831 triceps skinfold thickness 119883
2 thigh cir-
cumference and 1198833 mid-arm circumference and from
(27) that (1198831 1198833) are strongly linked Invoking the cutoff
rule [24] for VIFs in excess of 40 it is seen for G =
[0 0 0] in Table 12 that VF119888(1205731) and VF
119888(1205733) are excessive
where [X1X2X3] are forced to be mutually uncorrelated
as Reference Remarkably similar values are reported forG isin [1 0 0] [ 0 0 1] [1 0 1] all gaged against intractableReference models wherein (119883
1 1198833) are postulated to be
uncorrelated however incrediblyOn the other hand VF
119888s for G isin [0 1 0] [1 1 0]
[0 1 1] in Table 12 are likewise comparable where (X1X3)
are allowed to remain linked at X10158401X3= 242362 Here the
VF119888s reflect negligible changes in efficiency of the estimates
Advances in Decision Sciences 13
Table 9 Matrices X10158400X0 and (X10158400X0)minus1 for the body fat data under model119872
0centered to minima and scaled to equal lengths
X10158400X0
(X10158400X0)minus1
20 194365 194893 192934 03388 minus01284 minus01482 minus00203194365 25 194533 242362 minus01284 07199 00299 minus06224194893 194533 25 196832 minus01482 00299 01711 minus00494192934 242362 196832 25 minus00203 minus06224 minus00494 06979
Table 10 Matrices R119862and Rminus1
119862for the body fat data adjusted to give off-diagonal elements the value 00 with code G = [0 0 0]
R119862
Rminus1119862
20 194365 194893 192934 05083 minus01590 minus01622 minus01510194365 25 189402 187499 minus01590 01636 0 0194893 189402 25 188007 minus01622 0 01664 0192934 187499 188007 25 minus01510 0 0 01565
in comparison with the originalX10158400X0 where [X
1X2X3] are
all linked In summary negligible changes in overall efficiencywould accrue on recasting the experiment so that (X
1X2)
and (X2X3) were pairwise uncorrelated
Table 12 conveys further useful information on notingthat VFs as ratios are multiplicative and divisible Forexample take the model codedG = [1 1 0] whereX1015840
1X2and
X10158401X3are retained at their initial values under X1015840
0X0 namely
194533 and 242362 from Table 9 with (X2X3) unlinked
Now taking G = [0 1 0] as reference we extract the VFson dividing elements in Table 12 at G = [0 1 0] by those ofG = [1 1 0] The result is
[VF (1205730) VF (120573
1) VF (120573
2) VF (120573
3)]
= [09196
0984810073
1005710282
1020810208
10208]
= [09338 10017 10072 10000]
(29)
8 Conclusions
Our goal is clarity in the actual workings of VIFs as ill-conditioning diagnostics thus to identify and rectifymiscon-ceptions regarding the use of VIFs for models X
0= [1119899X]
with intercept Would-be arbiters for ldquocorrectrdquo usage havedivided between VIF
119888s for mean-centered regressors and
VIF119906s for uncentered data We take issue with both Conven-
tional but clouded vision holds that (i) VIF119906s gage variance
inflation owing to nonorthogonal columns of X0as vectors
in R119899 (ii) VIFs ge 10 and (iii) models having orthogonalmean-centered regressors are ldquoidealrdquo in possessing VIF
119888s of
value unity Reprising Remark 3 these properties widely andcorrectly understood for models in119872
119908 appear to have been
expropriated without verification for models in1198720hence the
anomaliesAccordingly we have distinguished vector-space orthog-
onality (119881perp) of columns of X0 in contrast to orthogonality
(119862perp) of mean-centered columns of X A key feature is theconstruction of second-moment arrays as Reference models(R119881R119862) encoding orthogonalities of these types and their
Variance Factors as VF119907rsquos and VF
119888rsquos respectively Contrary
to convention we demonstrate analytically and through casestudies that (i1015840) VIF
119906s do not gage variance inflation (ii1015840)
VFs ge 10 need not hold whereas VF lt 10 representsVariance Deflation and (iii1015840) models having orthogonalcentered regressors are not necessarily ldquoidealrdquo In particularvariance deflation occurs when ill-conditioned data yieldsmaller variances than corresponding orthogonal surrogates
In short our studies call out to adjudicate the prolongeddispute regarding centered and uncentered diagnostics formodels in119872
0 Our summary conclusions follow
(S1) The choice of a model with or without intercept isa substantive matter in the context of each exper-imental paradigm That choice is fixed beforehandin a particular setting is structural and is beyondthe scope of the present study Snee and Marquardt[10] correctly assert that VIFs are fully creditedand remain undisputed in models without interceptthese quantities unambiguously retain the meaningsoriginally intended namely as ratios of variances togage effects of nonorthogonal regressors For modelswith intercept the choice is between centered anduncentered diagnostics which is to be weighed asfollows
(S2) ConventionalVIF119888s apply incompletely to slopes only
not necessarily grasping fully that a system is illconditioned Of particular concern is missing evi-dence on collinearity of regressors with the interceptOn the other hand if 119862
perp-orthogonality pertainsthen VF
119888s may supplant VIF
119888s as applicable to all of
[1205730 1205731 120573
119896] Moreover the latter may reveal that
VF119888(1205730) lt 10 as in cited examples and of special
import in predicting near the origin of the coordinatesystem
(S3) Our VF119907s capture correctly the concept apparently
intended by conventional VIF119906s Specifically if 119881perp-
orthogonality pertains then VF119907s as genuine ratios
of variances may supplant VIF119906s as now discredited
gages for variance inflation(S4) Returning to Section 32 we subscribe fully but for
different reasons to Belsleyrsquos and othersrsquo contention
14 Advances in Decision Sciences
Table 11 Matrices R119862and Rminus1
119862for the body fat data adjusted to give off-diagonal elements the value 00 except X10158401X2 code G = [1 0 0]
R119862
Rminus1119862
20 194365 194893 192934 04839 minus01465 minus01497 minus01510194365 25 194533 187499 minus01465 01648 minus00141 0194893 194533 25 188007 minus01497 minus00141 01676 0192934 187499 188007 25 minus01510 0 0 01565
Table 12 Summary VF119888s for the body fat data centered to minima and scaled to common lengths identified with varying G = [119892
12 11989213 11989223]
from Definition 12
Elements of G VF119888s
11989212
11989213
11989223
1205730
1205731
1205732
1205733
0 0 0 06665 43996 10282 44586
1 0 0 07002 43681 10208 44586
0 1 0 09196 10073 10282 10208
0 0 1 07201 43996 10073 43681
1 1 0 09848 10057 10208 10208
1 0 1 07595 43681 10003 43681
0 1 1 10248 10073 10073 10160
that uncentered VIF119906s are essential in assessing ill-
conditioned data In the geometry of ill-conditioningthese should be retained in the pantheon of regres-sion diagnostics not as now debunked gages ofvariance inflation but as more compelling angularmeasures of the collinearity of each column of X
0=
[1119899X1X2 X
119896] with the subspace spanned by
remaining columns(S5) Specifically the degree of collinearity of regressors
with the intercept is quantified by 1205790deg which
if small is known to ldquocorrupt the estimates of allparameters in the model whether or not the interceptis itself of interest and whether or not the data havebeen centeredrdquo [21 p90]This in turnwould appear toelevate 120579
0in importance as a critical ill-conditioning
diagnostic(S6) In contrast Theorem 4(iii) identifies conventional
VIF119888s with angles between a centered regressor and
the subspace spanned by remaining centered regres-sors For example
1205791= arccos([
[
1 minus1
VIFc ( ldquo1205731)]
]
12
) (30)
is the angle between the centered vector Z1= B119899X1
and the remaining centered vectors These lie in thelinear subspace of R119899 comprising the orthogonalcomplement to 1
119899isin R119899 and thus bear on the
geometry of ill-conditioning in this subspace(S7) Whereas conventional VIFs are ldquoall or nonerdquo to
be gaged against strictly diagonal components ourReference models enable unlinking what may beunlinked while leaving other pairs of regressorslinked This enables us to assess effects on variances
attributed to allowable partial dependencies amongsome but not other regressors
In conclusion the foregoing summary points to limita-tions in modern software packages Minitab v15 119878119875119878119878 v19and 119878119860119878 v92 are standard software packages which computecollinearity diagnostics returning with the default optionsthe centered VIFs VIF
119888(120573119895) 1 le 119895 le 119896 To compute
the uncentered VIF119906s one needs to define a variable for
the constant term and perform a linear regression using theoptions of fitting without an intercept On the other handcomputations for VF
119888s and VF
119907s as introduced here require
additional if straightforward programming for exampleMaple and PROC IML of the SAS System as utilized here
References
[1] D A Belsley ldquoDemeaning conditioning diagnostics throughcenteringrdquo The American Statistician vol 38 no 2 pp 73ndash771984
[2] E R Mansfield and B P Helms ldquoDetecting multicollinearityrdquoThe American Statistician vol 36 no 3 pp 158ndash160 1982
[3] S D Simon and J P Lesage ldquoThe impact of collinearityinvolving the intercept term on the numerical accuracy ofregressionrdquo Computer Science in Economics and Managementvol 1 no 2 pp 137ndash152 1988
[4] D W Marquardt and R D Snee ldquoRidge regression in practicerdquoThe American Statistician vol 29 no 1 pp 3ndash20 1975
[5] K N Berk ldquoTolerance and condition in regression computa-tionsrdquo Journal of the American Statistical Association vol 72 no360 part 1 pp 863ndash866 1977
[6] A E Beaton D Rubin and J Barone ldquoThe acceptability ofregression solutions another look at computational accuracyrdquoJournal of the American Statistical Association vol 71 no 353pp 158ndash168 1976
Advances in Decision Sciences 15
[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975
[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970
[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987
[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984
[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990
[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980
[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969
[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996
[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003
[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006
[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008
[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010
[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981
[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983
[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987
[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986
[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980
[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007
[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984
[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969
[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997
[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000
[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003
[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983
Advances in Decision Sciences 7
(ii) In particular the Reference model in1198720for assessing
119862perp-orthogonality is R
119862= [ 119899 119899X
1015840
119899X W] such that C
119908= (W minus
M) and its inverse G from (6) are diagonal that is takingW = [119908
119894119895] such that 119908
119894119894= 119889119894119894 1 le 119894 le 119896 and 119908
119894119895=
119898119894119895 for all 119894 = 119895 or equivalently G = [0 0 0] In this
case we identify the model to be 119862perp-orthogonal
Recall that conventional VIF119888s for [120573
1 1205732 120573
119896] are
ratios of diagonal elements of (Z1015840Z)minus1 to reciprocals ofthe diagonal elements of the centered Z1015840Z Apparently thisrests on the logic that 119862perp-orthogonality in 119872
0implies that
Z1015840Z is diagonal However the converse fails and instead isembodied in R
119862isin 1198720 In consequence conventional VIF
119888s
are deficient in applying only to slopes whereas the VF119888s
resulting from Definition 12(ii) apply informatively for all of[1205730 1205731 1205732 120573
119896]
Definition 13 Designate VF119907(sdot) and VF
119888(sdot) as Variance Fac-
tors resulting from Reference models of Definition 8 andDefinition 12(ii) respectively On occasion these representVariance Deflation in addition to VIFs
Essential invariance properties pertain ConventionalVIF119888s are scale invariant see Section 3 of [9]We see next that
they are translation invariant as well To these ends we shiftthe columns of X=[X
1X2 X
119896] to a new origin through
X rarr X minus L = H where L = [11988811119899 11988821119899 119888
1198961119899] The
resulting model is
Y = (12057301119899+ L120573) +H120573 + 120598
lArrrArr Y = (1205730+ 120595) 1
119899+H120573 + 120598
(15)
thus preserving slopes where120595 = c1015840120573=(11988811205731+11988821205732+sdot sdot sdot+119888
119896120573119896)
Corresponding to X0= [1119899X] is U
0= [1119899H] and to X1015840
0X0
and its inverse are
U10158400U0= [
119899 11015840119899H
H10158401119899
H1015840H] (U10158400U0)minus1
= [11988700
b01
b101584001
G ] (16)
in the form of (6)This pertains to subsequent developmentsand basic invariance results emerge as follows
Lemma 14 Consider Y = 12057301119899+ X120573 + 120598 together with the
shifted version Y = (1205730+ 120595)1
119899+ H120573 + 120576 both in 119872
0 Then
the matrices G appearing in (6) and (16) are identical
Proof Rules for block-partitioned inverses again assert thatG of (16) is the inverse of
(H1015840H minus 119899minus1H10158401
11989911015840119899H) = H1015840 (I
119899minus 119899minus1111989911015840119899)H
= H1015840B119899H = X1015840B
119899X
(17)
since B119899L = 0 to complete our proof
These facts in turn support subsequent claims that cen-tered VIF
119888s are translation and scale invariant for slopes
1015840
=
[1205731 1205732 120573
119896] apart from the intercept 120573
0
43 A Critique Again we distinguish VIF119888s and VIF
119906s from
centered and uncentered regressorsThe following commentsapply
(C1) A design is either 119881perp or 119862perp-orthogonal respectivelyaccording as the lower right (119896times119896) blockX1015840X ofX1015840
0X0
orC = (X1015840Xminus119899XX1015840) from expression (6) is diagonalOrthogonalities of type 119881perp and 119862
perp are exclusive andhence work at crossed purposes
(C2) In particular 119862perp-orthogonality holds if and only ifthe columns of X are 119881
perp-nonorthogonal If X is 119862perp-orthogonal then for 119894 = 119895 C
119908(119894119895) = X1015840
119894X119895minus 119899119883119894119883119895=
0 and X1015840119894X119895
= 0 Conversely if X indeed is 119881perp-
orthogonal so that X1015840X = D= Diag(11988911 119889
119896119896)
then (D minus 119899XX1015840) cannot be diagonal as Reference inwhich case the conventional VIF
119888s are tenuous
(C3) Conventional VIF119906s based on uncentered X in 119872
0
with X = 0 do not gage variance inflation as claimedfounded on the false tenet that X1015840
0X0can be diagonal
(C4) To detect influential observations and to classify highleverage points casendashinfluence diagnostics namely119891119895119868
= VIF119895(minus119868)
VIF119895 1 le 119895 le 119896 are studied in [27]
for assessing the impact of subsets on variance infla-tion Here VIF
119895is from the full data and VIF
119895(minus119868)on
deleting observations in the index set 119868 Similarly [28]proposes using VIF
(119894)on deleting the 119894th observation
In the present context these would gain substance onmodifying VF
119907and VF
119888accordingly
5 Case Study 1 Continued
51 The Setting We continue an elementary and transparentexample to illustrate essentials Recall119872
0 119884119894= 1205730+12057311198831+
12057321198832+120598119894 of Section 22 the designX
0= [15X1X2] of order
(5 times 3) and U = X10158400X0 and its inverse V = (X10158400X0)minus1 as in
expressions (1) The design is neither 119881perp nor 119862perp orthogonalsince neither the lower right (2 times 2) block of X1015840
0X0nor the
centered Z1015840Z is diagonal Moreover the uncentered VIF119906s as
listed in (3) are not the vaunted relative increases in variancesowing to nonorthogonal columns of X
0 Indeed the only
opportunity for 119881perp-orthogonality here is between columns
[X1X2]
Nonetheless from Section 41 we utilize Theorem 4(i)and (10) to connect the collinearity indices of [9] to anglesbetween subspaces Specifically Minitab recovered valuesfor the residuals RS
119895= x
119895minus P119895x1198952 119895 = 0 1 2 fur-
ther computations proceed routinely as listed in Table 1 Inparticular the principal angle 120579
0between the constant vector
and the span of the regressor vectors is 1205790= 31751 deg as
anticipated in Remark 7
52 119881perp-Orthogonality For subsequent reference let 119863(119909) bea design as in (1) but with second-moment matrix
U (119909) = [
[
5 3 1
3 25 119909
1 119909 3
]
]
(18)
8 Advances in Decision Sciences
Table 1 Values for residuals RS119895= x119895minus P119895x1198952 for X
1198952 for 1205812
119906119895= VI F
119906(120573119895) for (1 minus 1120581
2
119906119895)12 and for 120579
119895 for 119895 = 0 1 2
119895 RS119895
X1198952
1205812
119895(1 minus 1120581
2
119906119895)12
120579119895(deg)
0 13846 50 36111 08503 317511 06429 25 38889 08619 304702 25714 30 11667 03780 67792
Invoking Definition 8 we seek a 119881perp-orthogonal Reference
with moment matrix U(0) having X10158401X2
= 0 that is 119863(0)This is found constructively on retaining [1
5 ∙X2] in X
0
but replacing X1by X01
= [1 05 05 1 0]1015840 as listed in
Table 2 giving the design 119863(0) with columns [15X01X2] as
ReferenceAccordingly R
119881= U(0) is ldquoas 119881
perp-orthogonal as it cangetrdquo given the lengths and sums of (X
1X2) as prescribed
in the experiment and as preserved in the Reference modelLemma 9 with X=[06 02]
1015840 and D= Diag(25 3) givesX1015840Dminus1X = 0157333 and 120574 = 5(1 minus 5X1015840Dminus1X) =
234375 Applications of Lemma 9(iii)-(iv) in succession givethe reference variances
Var (1205730| X1015840X = D) = [5 (1 minus 5X1015840Dminus1X)]
minus1
= 09375
Var (1205731|X1015840X=D)=
1
11988911
[1+1205741198832
1
11988911
]=2
5[1+
072120574
5]=17500
Var (1205732|X1015840X=D)=
1
11988922
[1+1205741198832
2
11988922
]=1
3[1+
004120574
3]=04375
(19)
As Reference these combine with actual variances from119881 = [119881
119894119895] at (1) giving VF
119907for the original designX
0relative
to R119881 For example VF
119907(1205730) = 0722209375 = 07704 with
11988111
= 07222 from (1) and 09375 from (19) This contrastswith VIF
119906(1205730) = 36111 in (3) as in Section 22 and Table 2
it is noteworthy that the nonorthogonal design 119863(1) yieldsuniformly smaller variances than the 119881
perp-orthogonal R119881
namely119863(0)Further versions 119863(119909) 119909 isin [minus05 00 05 06 10 15]
of our basic design are listed for comparison in Table 2 where119909 = X1015840
1X2for the various designs The designs themselves are
exhibited on transposing rowsX10158401= [11988311 11988312 11988313 11988314 11988315]
from Table 2 and substituting each into the template X00
=
[15 ∙X2] The design 119863(2) was constructed but not listed
since its X10158400X0is not invertible Clearly these are actual
designs amenable to experimental deployment
53 119862perp-Orthogonality Continuing X0
= [15X1X2] and
invoking Definition 12(ii) we seek a 119862perp-orthogonal
Reference having the matrix C119908
as diagonal This is
identified as 119863(06) in Table 2 From this the matrix R119862and
its inverse are
R119862= [
[
5 3 1
3 25 06
1 06 3
]
]
Rminus1119862
= [
[
07286 minus08572 minus00714
minus08571 14286 00
minus00714 00 03571
]
]
(20)
The variance factors are listed in (4) where for exampleVF119888(1205732) = 0388903571 = 10889 As distinct from conven-
tional VIF119888s for 120573
1and 120573
2only our VF
119888(1205730) here reflects
Variance Deflationwherein 1205730is estimated more precisely in
the initial 119862perp-nonorthogonal designA further note on orthogonality is germane Suppose
that the actual experiment is 119863(0) yet towards a thoroughdiagnosis the user evaluates the conventional VIF
119888(1205731) =
12250 = VIF119888(1205732) as ratios of variances of 119863(0) to 119863(06)
from Table 2 Unfortunately their meaning is obscured sincea design cannot at once be 119881perp and 119862
perp-orthogonalAs in Remark 2 we next alter the basic designX
0at (1) on
shifting columns of themeasurements to [0 0] asminima andscaling to have squared lengths equal to 119899 = 5 The resultingX0follows directly from (1) The new matrix U = X
1015840
0X0 andits inverse V are
U = [
[
5 42426 42426
42426 5 4
42426 4 5
]
]
V = [
[
1 minus04714 minus04714
minus04714 07778 minus02222
minus04714 minus02222 07778
]
]
(21)
giving conventional VIF119906s as [5 38889 38889] Against 119862perp-
orthogonality this gives the diagnostics
[VF119888(1205730) VF119888(1205731) VF119888(1205732)] = [08140 10889 10889]
(22)
demonstrating in comparison with (4) that VF119888s apart from
1205730 are invariant under translation and scaling a consequence
of Lemma 14We further seek to compare variances for the shifted
and scaled (X1X2) against 119881perp-orthogonality as Reference
However from U = X10158400X0at (21) we determine that 119883
1=
084853 = 1198832and 5[(119883
2
15) + (119883
2
25)] = 14400 Since
the latter exceeds unity we infer from Lemma 9(i) that
Advances in Decision Sciences 9
Table 2 Designs 119863(119909) 119909 isin [minus05 00 05 06 10 15] obtained by substitutingX01= [11988311 11988312 11988313 11988314 11988315]1015840 in [15 ∙X2] of order (5times3)
and corresponding variances Var (1205730)Var (120573
1)Var (120573
2)
Design 11988311
11988312
11988313
11988314
11988315
Var (1205730) Var (120573
1) Var (120573
2)
119863(minus05) 10 00 05 10 05 19333 37333 09333119863(00) 10 05 05 10 00 09375 17500 04375119863(05) 05 10 00 10 05 07436 14359 03590119863(06) 06091 05793 06298 11819 0000 07286 14286 03570119863(10) 00 05 05 10 10 07222 15556 03889119863(15) 00 10 05 05 10 09130 24348 06087
119881perp-orthogonality is incompatible with this configuration
of the data so that the VF119907s are undefined Equivalently
on evaluating 1206011
= 1206012
= 2290 deg and 1206011+ 1206012
=
4580 lt 90 deg Corollary 10 asserts that R119881is not positive
definite This appears to be anomalous until we recallthat 119881perp-orthogonality is invariant under rescaling but notrecentering the regressors
54 A Critique We reiterate apparent ambiguity ascribedin the literature to orthogonality A number of widely heldguiding precepts has evolved as enumerated in part in ouropening paragraph and in Section 3 As in Remark 3 theseclearly have been taken to apply verbatim forX
0andX1015840
0X0in
1198720 to be paraphrased in part as follows(P1) Ill-conditioning espouses inflated variances that is
VIFs necessarily equal or exceed unity(P2) 119862perp-orthogonal designs are ldquoidealrdquo in that VIF
119888s for
such designs are all unity see [11] for exampleWe next reassess these precepts as they play out under119881perp and 119862
perp orthogonality in connection with Table 2(C1) For models in 119872
0having X = 0 we reiterate that
the uncentered VIF119906s at expression (3) namely
[36111 38889 11667] overstate adverse effects onvariances of the 119881perp-nonorthogonal array since X1015840
0X0
cannot be diagonal Moreover these conventionalvalues fail to discern that the revised values for VF
119907s
at (5) namely [07704 08889 08889] reflect that[1205730 1205731 1205732] are estimated with greater efficiency in the
119881perp-nonorthogonal array119863(1)
(C2) For 119881perp-orthogonality the claim P1 thus is false by
counterexampleTheVFs at (5) areVarianceDeflationFactors for design 119863(1) where X1015840
1X2= 1 relative to
the 119881perp-orthogonal119863(0)(C3) Additionally the variances Var[120573
0(119909)] Var[120573
1(119909)]
Var[1205732(119909)] in Table 2 are all seen to decrease from
119863(0) [09375 17500 04375] rarr 119863(1) [0722215556 03889] despite the transition from the 119881
perp-orthogonal 119863(0) to the 119881
perp-nonorthogonal 119863(1)Similar trends are confirmed in other ill-conditioneddata sets from the literature
(C4) For 119862perp-orthogonal designs the claim P2 is false by
counterexample Such designs need not be ldquoidealrdquoas 1205730may be estimated more efficiently in a 119862
perp-nonorthogonal design as demonstrated by the VF
119888s
for 1205730at (4) and (22) Similar trends are confirmed
elsewhere as seen subsequently
(C5) VFs for 1205730are critical in prediction where prediction
variances necessarily depend on Var(1205730) especially
for predicting near the origin of the system of coor-dinates
(C6) Dissonance between 119881perp and 119862
perp is seen in Table 2where 119863(06) as 119862
perp-orthogonal with X10158401X2
= 06is the antithesis of 119881perp-orthogonality at 119863(0) whereX10158401X2= 0
(C7) In short these transparent examples serve to dispelthe decades-old mantra that ill-conditioning neces-sarily spawns inflated variances formodels in119872
0 and
they serve to illuminate the contributing structures
6 Orthogonal and Linked Arrays
A genuine 119881perp-orthogonal array was generated as eigenvec-
tors E = [E1E2 E
8] from a positive definite (8 times 8)
matrix (see Table 83 of [11 p377]) using PROC IML of theSAS programming package The columns [X
1X2X3] as the
second through fourth columns of Table 3 comprise the firstthree eigenvectors scaled to length 8 These apply for modelsin 119872119908and 119872
0to be analyzed In addition linked vectors
(Z1Z2Z3) were constructed as Z
1= 119888(09E
1+ 01E
2)
Z2
= 119888(09E2+ 01E
3) and Z
3= 119888(09E
3+ 01E
4) where
119888 = radic8082 Clearly these arrays as listed in Table 3 are notarchaic abstractions as both are amenable to experimentalimplementation
61 The Model 1198720 We consider in turn the orthogonal and
the linked series
611 Orthogonal Data Matrices X10158400X0and (X1015840
0X0)minus1 for the
orthogonal data under model1198720are listed in Table 4 where
variances occupy diagonals of (X10158400X0)minus1 The conventional
uncentered VIF119906s are [120296 102144 112256 105888]
Since X = 0 we find the angle between the constant andthe span of the regressor vectors to be 120579
0= 65748 deg as
in Theorem 4(ii) Moreover the angle between X1and the
span of [1119899X2X3] namely 120579
1= 81670 deg is not 90 deg
because of collinearity with the constant vector despite themutual orthogonality of [X
1X2X3] Observe here thatX1015840
0X0
10 Advances in Decision Sciences
Table 3 Basic 119881perp-orthogonal design X0 = [18 X1 X2 X3] and a linked design Z0 = [18 Z1 Z2 Z3] of order (8 times 4)
Orthogonal (X1X2X3) Linked (Z
1Z2Z3)
18
X1
X2
X3
18
Z1
Z2
Z3
1 minus1084470 0056899 0541778 1 minus1071550 0116381 06534861 minus1090010 minus0040490 0117026 1 minus1087820 minus0027320 01160861 0979050 0002564 2337454 1 0973345 0260677 21996871 minus1090230 0016839 0170711 1 minus1081700 0035588 00706941 0127352 2821941 minus0071670 1 0438204 2796767 minus00707101 1045409 minus0122670 minus1476870 1 1025468 minus0285010 minus16185601 1090803 minus0088970 0043354 1 1074306 minus0083640 01769281 1090739 minus0092300 0108637 1 1073875 minus0079740 0244567
Table 4 Matrices X10158400X0 and (X10158400X0)minus1 for the orthogonal data under model119872
0
X10158400X0
(X10158400X0)minus1
8 10686 25538 17704 01504 minus00201 minus00480 minus0033310686 8 0 0 minus00201 01277 00064 0004425538 0 8 0 minus00480 00064 01403 0010617704 0 0 8 minus00333 00044 00106 01324
already is 119881perp-orthogonal accordingly R119881= X10158400X0 and thus
the VF119907s are all unity
In view of dissonance between 119881perp and 119862
perp orthogonalitythe sums of squares and products for the mean-centered Xare
X1015840B119899X = [
[
78572 minus03411 minus02365
minus03411 71847 minus05652
minus02365 minus05652 76082
]
]
(23)
reflecting nonnegligible negative dependencies to distin-guish the 119881
perp-orthogonal X from the 119862perp-nonorthogonal
matrix Z = B119899X of deviations
To continue we suppose instead that the 119881perp-orthogonalmodel was to be recast as 119862
perp-orthogonal The Referencemodel and its inverse are listed in Table 5 Variance factorsas ratios of diagonal elements of (X1015840
0X0)minus1 in Table 4 to those
of Rminus1119862
in Table 5 are
[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]
= [10168 10032 10082 10070]
(24)
reflecting negligible differences in precision This parallelsresults reported in Table 2 comparing 119863(0) to 119863(06) asReference except for larger differences in Table 2 namely[128681225012250]
612 Linked Data For the Linked Data under model 1198720
the matrix Z10158400Z0 follows routinely from Table 3 its inverse(Z10158400Z0)minus1 is listed in Table 6 The conventional uncentered
VIF119906s are now [120320 103408 113760 105480] These
again fail to gage variance inflation since Z10158400Z0cannot be
diagonal From (10) the principal angle between the constantvector and the span of the regressor vectors is 120579
0= 65735
degrees
Remark 15 Observe that the choice for R119862may be posed as
seeking R119862
= U(119909) such that Rminus1119862(2 3) = 00 then solving
numerically for 119909 using Maple for example However thealgorithm in Definition 12(ii) affords a direct solution thatC119908should be diagonal stipulates its off-diagonal element as
X10158401X2minus 35 = 0 so that X1015840
1X2= 06 in R
119862at expression (20)
To illustrate Theorem 4(iii) we compute cos(120579) = (1 minus
110889)12 = 02857 and 120579 = 73398 deg as the angle between
the vectors [X1X2] of (1) when centered to their means For
slopes in 1198720 [9] shows that VIF
119888(120573119895) lt VIF
119906(120573119895) 1 le
119894 le 119896 This follows since their numerators are equal butdenominators are reciprocals of lengths of the centered anduncentered regressor vectors To illustrate numerators areequal for VIF
119906(1205733) = 03889(13) = 11667 and for
VIF119888(1205733) = 03889(128) = 10889 but denominators are
reciprocals of X10158402X2= 3 and sum
5
119895=1(1198832119895
minus 1198832)2= 28
(119894) 119881perp Reference Model To rectify this malapropism we seek
in Definition 8 a model R119881as Reference This is found on
setting all off-diagonal elements of Z10158400Z0to zero excluding
the first row and column Its inverse Rminus1119881
is listed in Table 6The VF
119907s are ratios of diagonal elements of (Z1015840
0Z0)minus1 on the
left to diagonal elements of Rminus1119881
on the right namely
[VF119907(1205730) VF119907(1205731) VF119907(1205732) VF119907(1205733)]
= [09697 09991 09936 09943]
(25)
Against conventional wisdom it is counter-intuitive that[1205730 1205731 1205732 1205733] are all estimated with greater precision in the
119881perp-nonorthogonal model Z1015840
0Z0than in the 119881
perp-orthogonalmodel R
119881of Definition 8 As in Section 54 this serves again
to refute the tenet that ill-conditioning espouses inflated
Advances in Decision Sciences 11
Table 5 Matrices R119862and Rminus1
119862for checking 119862
perp-orthogonality in the orthogonal data X0 under model1198720
R119862
Rminus1119862
8 10686 25538 17704 01479 minus00170 minus00444 minus0029110686 8 03411 02365 minus00170 01273 0 025538 03411 8 05652 minus00444 0 01392 017704 02365 05652 8 minus00291 0 0 01314
Table 6 Matrices (Z10158400Z0)minus1 and Rminus1
119881for checking 119881
perp-orthogonality for the linked data under model1198720
(Z10158400Z0)minus1 Rminus1
119881
01504 minus00202 minus00461 minus00283 01551 minus00261 minus00530 minus00344minus00202 01293 minus000797 00053 minus00261 01294 00089 00058minus00461 minus00079 01422 minus00054 minus00530 00089 01431 00117minus00283 00053 minus00054 01319 minus00344 00058 00117 01326
variances for models in1198720 that is that VFs necessarily equal
or exceed unity(119894119894) 119862
perp Reference Model Further consider variances werethe Linked Data to be centered As in Definition 12(ii)we seek a Reference model R
119862such that C
119908= (W minus
M) is diagonal The required R119862
and its inverse arelisted in Table 7 Since variances in the Linked Data are[015040 012926 014220 013185] from Table 6 and Refer-ence values appear as diagonal elements of Rminus1
119862on the right
in Table 7 their ratios give the centered VF119888s as
[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]
= [09920 10049 10047 10030]
(26)
We infer that changes in precision would be negligible ifinstead the Linked Data experiment was to be recast as a 119862perp-orthogonal experiment
As in Remark 15 the reference matrix R119862is constrained
by the first and second moments from X10158400X0and has free
parameters 1199091 1199092 1199093 for the (2 3) (2 4) (3 4) entries
The constraints for 119862perp-orthogonality are Rminus1
119862(2 3) =
Rminus1119862(2 4) = Rminus1
119862(3 4) = 0 Although this system of
equations can be solved with numerical software such asMaple the algorithm stated in Definition 12 easily yields thedirect solution given here
62 The Model 119872119908 For the Linked Data take 119884
119894= 12057311198851198941+
12057321198851198942
+ 12057331198851198943
+ 120598119894 1 le 119894 le 8 as Y = Z120573 + 120598
where the expected response at 1198851
= 1198852
= 1198853
= 0
is 119864(119884) = 00 and there is no occasion to translate theregressorsThe lower right (3 times 3) submatrix ofZ1015840
0Z0(4 times 4)
from 1198720is Z1015840Z its inverse is listed in Table 8 The ratios
of diagonal elements of (Z1015840Z)minus1 to reciprocals of diagonalelements of Z1015840Z are the conventional uncentered VIF
119906s
namely [10123 10247 10123] Equivalently scaling Z1015840Z tohave unit diagonals and inverting gives Cminus1 as in Table 8 itsdiagonal elements are the VIF
119906sThroughout Section 6 these
comprise the only correct interpretation of conventionalVIF119906s as genuine variance ratios
7 Case Study Body Fat Data
71 The Setting Body fat and morphogenic measurementswere reported for 119899 = 20 human subjects in Table D11 of[29 p717] Response and regressor variables are119884 amount ofbody fat X
1 triceps skinfold thickness X
2 thigh circumfer-
ence and X3 mid-arm circumference under119872
0 119884119894= 1205730+
12057311198831+12057321198832+12057331198833+ 120598119894 From the original data as reported
the condition number of X10158400X0is 1039931 and the VIF
119906s are
[2714800 4469419 630998 778703] On scaling columnsof X to equal column lengths X
119894 = 50 119894 = 1 2 3 the
resulting condition number of the revisedX10158400X0is 2 9696518
and the uncenteredVIF119906s are as before from invariance under
scalingAgainst complaints that regressors are remote from their
natural origins we center regressors to [0 0 0] as origin onsubtracting the minimal element from each column as inRemark 2 and scaling these to X
119894 = 50 119894 = 1 2 3
This is motivated on grounds that centered diagnostics apartfrom VIF
119888(1205730) are invariant under shifts and scalings from
Lemma 14 In addition under the new origin [0 0 0] 1205730
assumes prominence as the baseline response against whichchanges due to regressors are to be gaged The original dataare available on the publisherrsquos web page for [29]
Taking these shifted and scaled vectors as columnsof X = [X
1X2X3] in the revised X
0= [1
119899X]
values for X10158400X0
and its inverse are listed in Table 9The condition number of X1015840
0X0
is now 1136969 andits VIF
119906s are [6775617998742782174484] Additionally
from Theorem 4(ii) we recover the angle 1205790= 22592 deg to
quantify collinearity of regressors with the constant vectorIn addition the scaled and centered correlationmatrix for
X is
R = [
[
10 00847 08781
00847 10 01424
08781 01424 10
]
]
(27)
indicating strong linkage between mean-centered versions of(X1X3)Moreover the experimental design is neither119881perp nor
119862perp-orthogonal since neither the lower right (3 times 3) block of
X10158400X0nor the centered matrixC = (X1015840Xminus119899XX1015840) is diagonal
12 Advances in Decision Sciences
Table 7 Matrices R119862and Rminus1
119862for checking 119862
perp-orthogonality in the linked data under model1198720
R119862
Rminus1119862
8 13441 27337 17722 01516 minus00216 minus00484 minus0029113441 8 04593 02977 minus00216 01286 0 027337 04593 8 06056 minus00484 0 01415 017722 02977 06056 8 minus00291 0 0 01314
Table 8 Matrices (Z1015840Z)minus1 and Cminus1 for the linked data under model119872119908
(Z1015840Z)minus1 Cminus1
01265 minus00141 00015 10123 minus01125 00123minus00141 01281 minus00141 minus01125 10247 minus0112500015 minus00141 01265 00123 minus01125 10123
(119894) 119881perp Reference Model To compare this with a 119881
perp-or-thogonal design we turn again to Definition 8 Howeverevaluating the test criterion in Lemma 9(119894) shows that thisconfiguration is not commensurate with 119881
perp-orthogonalityso that the VF
119907s are undefined Comparisons with 119862
perp-orthogonal models are addressed next(119894119894) 119862perp Reference Model As in Definition 12(ii) the centering
matrix is
119899XX1015840 = [
[
188889 189402 187499
189402 189916 188007
187499 188007 186118
]
]
(28)
To check against 119862perp-orthogonality we obtain the matrix Wof Definition 12(ii) on replacing off-diagonal elements of thelower right (3 times 3) submatrix ofX1015840
0X0by corresponding off-
diagonal elements of 119899XX1015840 The result is the matrix R119862and
its inverse as given in Table 10 where diagonal elements ofRminus1119862
are the Reference variances The VF119888s found as ratios
of diagonal elements of (X10158400X0)minus1 relative to those of Rminus1
119862
are listed in Table 12 under G = [0 0 0] the indicator ofDefinition 12(i) for this case
72 Variance Factors and Linkage Traditional gages of ill-conditioning are patently absurd on occasion By conventionVIFs are ldquoall or nonerdquo wherein Reference models entailstrictly diagonal components In practice some regressorsare inextricably linked to get or even visualize orthogonalregressors may go beyond feasible experimental rangesrequire extrapolation beyond practical limits and challengecredulity Response functions so constrained are described in[30] as ldquopicket fencesrdquo In such cases taking C to be diagonalas its 119862perp Reference is moot at best an academic folly abettedin turn by default in standard software packages In shortconventional diagnostics here offer answers to irrelevantquestions Given pairs of regressors irrevocably linked weseek instead to assess effects on variances for other pairsthat could be unlinked by design These comments applyespecially to entries under G = [0 0 0] in Table 12
To proceed we define essential ill-conditioning as regres-sors inextricably linked necessarily to remain so and asnonessential if regressors could be unlinked by design As a
followup to Section 32 for X = 0 we infer that the constantvector is inextricably linked with regressors thus accountingfor essential ill-conditioning not removed by centeringcontrary to claims in [4] Indeed this is the essence ofDefinition 8
Fortunately these limitations may be remedied throughReference models adapted to this purpose This in turnexploits the special notation and trappings of Definition 12(i)as follows In addition toG = [0 0 0] we iterate for indicatorsG = [119892
12 11989213 11989223] taking values in [1 0 0] [0 1 0] [0 0 1]
[1 1 0] [1 0 1] [0 1 1] Values of VF119888s thus obtained are
listed in Table 12 To fix ideas the Reference R119862for G =
[1 0 0] that is for the constraint R119862(2 3) = X1015840
0X0(2 3) =
194533 is given in Table 11 together with its inverse Foundas ratios of diagonal elements of (X1015840
0X0)minus1 relative to those of
Rminus1119862
in Table 11 VF119888s are listed in Table 12 underG = [1 0 0]
In short R119862is ldquoas 119862
perp-orthogonal as it could berdquo given theconstraint Other VF
119888s in Table 12 proceed similarly For all
cases in Table 12 excluding G = [0 1 1] 1205730is estimated
with greater efficiency in the original model than any of thepostulated Reference models
As in Remark 15 the reference matrix R119862
for G =
[1 0 0] is constrained by X10158401X2
= 194533 to be retainedwhereas X1015840
1X3
= 1199091and X1015840
2X3
= 1199092are to be determined
from Rminus1119862(2 4) = Rminus1
119862(3 4) = 0 Although this system
can be solved numerically as noted the algorithm stated inDefinition 12 easily yields the direct solution given here
Recall 1198831 triceps skinfold thickness 119883
2 thigh cir-
cumference and 1198833 mid-arm circumference and from
(27) that (1198831 1198833) are strongly linked Invoking the cutoff
rule [24] for VIFs in excess of 40 it is seen for G =
[0 0 0] in Table 12 that VF119888(1205731) and VF
119888(1205733) are excessive
where [X1X2X3] are forced to be mutually uncorrelated
as Reference Remarkably similar values are reported forG isin [1 0 0] [ 0 0 1] [1 0 1] all gaged against intractableReference models wherein (119883
1 1198833) are postulated to be
uncorrelated however incrediblyOn the other hand VF
119888s for G isin [0 1 0] [1 1 0]
[0 1 1] in Table 12 are likewise comparable where (X1X3)
are allowed to remain linked at X10158401X3= 242362 Here the
VF119888s reflect negligible changes in efficiency of the estimates
Advances in Decision Sciences 13
Table 9 Matrices X10158400X0 and (X10158400X0)minus1 for the body fat data under model119872
0centered to minima and scaled to equal lengths
X10158400X0
(X10158400X0)minus1
20 194365 194893 192934 03388 minus01284 minus01482 minus00203194365 25 194533 242362 minus01284 07199 00299 minus06224194893 194533 25 196832 minus01482 00299 01711 minus00494192934 242362 196832 25 minus00203 minus06224 minus00494 06979
Table 10 Matrices R119862and Rminus1
119862for the body fat data adjusted to give off-diagonal elements the value 00 with code G = [0 0 0]
R119862
Rminus1119862
20 194365 194893 192934 05083 minus01590 minus01622 minus01510194365 25 189402 187499 minus01590 01636 0 0194893 189402 25 188007 minus01622 0 01664 0192934 187499 188007 25 minus01510 0 0 01565
in comparison with the originalX10158400X0 where [X
1X2X3] are
all linked In summary negligible changes in overall efficiencywould accrue on recasting the experiment so that (X
1X2)
and (X2X3) were pairwise uncorrelated
Table 12 conveys further useful information on notingthat VFs as ratios are multiplicative and divisible Forexample take the model codedG = [1 1 0] whereX1015840
1X2and
X10158401X3are retained at their initial values under X1015840
0X0 namely
194533 and 242362 from Table 9 with (X2X3) unlinked
Now taking G = [0 1 0] as reference we extract the VFson dividing elements in Table 12 at G = [0 1 0] by those ofG = [1 1 0] The result is
[VF (1205730) VF (120573
1) VF (120573
2) VF (120573
3)]
= [09196
0984810073
1005710282
1020810208
10208]
= [09338 10017 10072 10000]
(29)
8 Conclusions
Our goal is clarity in the actual workings of VIFs as ill-conditioning diagnostics thus to identify and rectifymiscon-ceptions regarding the use of VIFs for models X
0= [1119899X]
with intercept Would-be arbiters for ldquocorrectrdquo usage havedivided between VIF
119888s for mean-centered regressors and
VIF119906s for uncentered data We take issue with both Conven-
tional but clouded vision holds that (i) VIF119906s gage variance
inflation owing to nonorthogonal columns of X0as vectors
in R119899 (ii) VIFs ge 10 and (iii) models having orthogonalmean-centered regressors are ldquoidealrdquo in possessing VIF
119888s of
value unity Reprising Remark 3 these properties widely andcorrectly understood for models in119872
119908 appear to have been
expropriated without verification for models in1198720hence the
anomaliesAccordingly we have distinguished vector-space orthog-
onality (119881perp) of columns of X0 in contrast to orthogonality
(119862perp) of mean-centered columns of X A key feature is theconstruction of second-moment arrays as Reference models(R119881R119862) encoding orthogonalities of these types and their
Variance Factors as VF119907rsquos and VF
119888rsquos respectively Contrary
to convention we demonstrate analytically and through casestudies that (i1015840) VIF
119906s do not gage variance inflation (ii1015840)
VFs ge 10 need not hold whereas VF lt 10 representsVariance Deflation and (iii1015840) models having orthogonalcentered regressors are not necessarily ldquoidealrdquo In particularvariance deflation occurs when ill-conditioned data yieldsmaller variances than corresponding orthogonal surrogates
In short our studies call out to adjudicate the prolongeddispute regarding centered and uncentered diagnostics formodels in119872
0 Our summary conclusions follow
(S1) The choice of a model with or without intercept isa substantive matter in the context of each exper-imental paradigm That choice is fixed beforehandin a particular setting is structural and is beyondthe scope of the present study Snee and Marquardt[10] correctly assert that VIFs are fully creditedand remain undisputed in models without interceptthese quantities unambiguously retain the meaningsoriginally intended namely as ratios of variances togage effects of nonorthogonal regressors For modelswith intercept the choice is between centered anduncentered diagnostics which is to be weighed asfollows
(S2) ConventionalVIF119888s apply incompletely to slopes only
not necessarily grasping fully that a system is illconditioned Of particular concern is missing evi-dence on collinearity of regressors with the interceptOn the other hand if 119862
perp-orthogonality pertainsthen VF
119888s may supplant VIF
119888s as applicable to all of
[1205730 1205731 120573
119896] Moreover the latter may reveal that
VF119888(1205730) lt 10 as in cited examples and of special
import in predicting near the origin of the coordinatesystem
(S3) Our VF119907s capture correctly the concept apparently
intended by conventional VIF119906s Specifically if 119881perp-
orthogonality pertains then VF119907s as genuine ratios
of variances may supplant VIF119906s as now discredited
gages for variance inflation(S4) Returning to Section 32 we subscribe fully but for
different reasons to Belsleyrsquos and othersrsquo contention
14 Advances in Decision Sciences
Table 11 Matrices R119862and Rminus1
119862for the body fat data adjusted to give off-diagonal elements the value 00 except X10158401X2 code G = [1 0 0]
R119862
Rminus1119862
20 194365 194893 192934 04839 minus01465 minus01497 minus01510194365 25 194533 187499 minus01465 01648 minus00141 0194893 194533 25 188007 minus01497 minus00141 01676 0192934 187499 188007 25 minus01510 0 0 01565
Table 12 Summary VF119888s for the body fat data centered to minima and scaled to common lengths identified with varying G = [119892
12 11989213 11989223]
from Definition 12
Elements of G VF119888s
11989212
11989213
11989223
1205730
1205731
1205732
1205733
0 0 0 06665 43996 10282 44586
1 0 0 07002 43681 10208 44586
0 1 0 09196 10073 10282 10208
0 0 1 07201 43996 10073 43681
1 1 0 09848 10057 10208 10208
1 0 1 07595 43681 10003 43681
0 1 1 10248 10073 10073 10160
that uncentered VIF119906s are essential in assessing ill-
conditioned data In the geometry of ill-conditioningthese should be retained in the pantheon of regres-sion diagnostics not as now debunked gages ofvariance inflation but as more compelling angularmeasures of the collinearity of each column of X
0=
[1119899X1X2 X
119896] with the subspace spanned by
remaining columns(S5) Specifically the degree of collinearity of regressors
with the intercept is quantified by 1205790deg which
if small is known to ldquocorrupt the estimates of allparameters in the model whether or not the interceptis itself of interest and whether or not the data havebeen centeredrdquo [21 p90]This in turnwould appear toelevate 120579
0in importance as a critical ill-conditioning
diagnostic(S6) In contrast Theorem 4(iii) identifies conventional
VIF119888s with angles between a centered regressor and
the subspace spanned by remaining centered regres-sors For example
1205791= arccos([
[
1 minus1
VIFc ( ldquo1205731)]
]
12
) (30)
is the angle between the centered vector Z1= B119899X1
and the remaining centered vectors These lie in thelinear subspace of R119899 comprising the orthogonalcomplement to 1
119899isin R119899 and thus bear on the
geometry of ill-conditioning in this subspace(S7) Whereas conventional VIFs are ldquoall or nonerdquo to
be gaged against strictly diagonal components ourReference models enable unlinking what may beunlinked while leaving other pairs of regressorslinked This enables us to assess effects on variances
attributed to allowable partial dependencies amongsome but not other regressors
In conclusion the foregoing summary points to limita-tions in modern software packages Minitab v15 119878119875119878119878 v19and 119878119860119878 v92 are standard software packages which computecollinearity diagnostics returning with the default optionsthe centered VIFs VIF
119888(120573119895) 1 le 119895 le 119896 To compute
the uncentered VIF119906s one needs to define a variable for
the constant term and perform a linear regression using theoptions of fitting without an intercept On the other handcomputations for VF
119888s and VF
119907s as introduced here require
additional if straightforward programming for exampleMaple and PROC IML of the SAS System as utilized here
References
[1] D A Belsley ldquoDemeaning conditioning diagnostics throughcenteringrdquo The American Statistician vol 38 no 2 pp 73ndash771984
[2] E R Mansfield and B P Helms ldquoDetecting multicollinearityrdquoThe American Statistician vol 36 no 3 pp 158ndash160 1982
[3] S D Simon and J P Lesage ldquoThe impact of collinearityinvolving the intercept term on the numerical accuracy ofregressionrdquo Computer Science in Economics and Managementvol 1 no 2 pp 137ndash152 1988
[4] D W Marquardt and R D Snee ldquoRidge regression in practicerdquoThe American Statistician vol 29 no 1 pp 3ndash20 1975
[5] K N Berk ldquoTolerance and condition in regression computa-tionsrdquo Journal of the American Statistical Association vol 72 no360 part 1 pp 863ndash866 1977
[6] A E Beaton D Rubin and J Barone ldquoThe acceptability ofregression solutions another look at computational accuracyrdquoJournal of the American Statistical Association vol 71 no 353pp 158ndash168 1976
Advances in Decision Sciences 15
[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975
[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970
[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987
[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984
[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990
[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980
[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969
[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996
[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003
[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006
[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008
[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010
[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981
[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983
[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987
[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986
[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980
[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007
[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984
[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969
[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997
[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000
[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003
[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983
8 Advances in Decision Sciences
Table 1 Values for residuals RS119895= x119895minus P119895x1198952 for X
1198952 for 1205812
119906119895= VI F
119906(120573119895) for (1 minus 1120581
2
119906119895)12 and for 120579
119895 for 119895 = 0 1 2
119895 RS119895
X1198952
1205812
119895(1 minus 1120581
2
119906119895)12
120579119895(deg)
0 13846 50 36111 08503 317511 06429 25 38889 08619 304702 25714 30 11667 03780 67792
Invoking Definition 8 we seek a 119881perp-orthogonal Reference
with moment matrix U(0) having X10158401X2
= 0 that is 119863(0)This is found constructively on retaining [1
5 ∙X2] in X
0
but replacing X1by X01
= [1 05 05 1 0]1015840 as listed in
Table 2 giving the design 119863(0) with columns [15X01X2] as
ReferenceAccordingly R
119881= U(0) is ldquoas 119881
perp-orthogonal as it cangetrdquo given the lengths and sums of (X
1X2) as prescribed
in the experiment and as preserved in the Reference modelLemma 9 with X=[06 02]
1015840 and D= Diag(25 3) givesX1015840Dminus1X = 0157333 and 120574 = 5(1 minus 5X1015840Dminus1X) =
234375 Applications of Lemma 9(iii)-(iv) in succession givethe reference variances
Var (1205730| X1015840X = D) = [5 (1 minus 5X1015840Dminus1X)]
minus1
= 09375
Var (1205731|X1015840X=D)=
1
11988911
[1+1205741198832
1
11988911
]=2
5[1+
072120574
5]=17500
Var (1205732|X1015840X=D)=
1
11988922
[1+1205741198832
2
11988922
]=1
3[1+
004120574
3]=04375
(19)
As Reference these combine with actual variances from119881 = [119881
119894119895] at (1) giving VF
119907for the original designX
0relative
to R119881 For example VF
119907(1205730) = 0722209375 = 07704 with
11988111
= 07222 from (1) and 09375 from (19) This contrastswith VIF
119906(1205730) = 36111 in (3) as in Section 22 and Table 2
it is noteworthy that the nonorthogonal design 119863(1) yieldsuniformly smaller variances than the 119881
perp-orthogonal R119881
namely119863(0)Further versions 119863(119909) 119909 isin [minus05 00 05 06 10 15]
of our basic design are listed for comparison in Table 2 where119909 = X1015840
1X2for the various designs The designs themselves are
exhibited on transposing rowsX10158401= [11988311 11988312 11988313 11988314 11988315]
from Table 2 and substituting each into the template X00
=
[15 ∙X2] The design 119863(2) was constructed but not listed
since its X10158400X0is not invertible Clearly these are actual
designs amenable to experimental deployment
53 119862perp-Orthogonality Continuing X0
= [15X1X2] and
invoking Definition 12(ii) we seek a 119862perp-orthogonal
Reference having the matrix C119908
as diagonal This is
identified as 119863(06) in Table 2 From this the matrix R119862and
its inverse are
R119862= [
[
5 3 1
3 25 06
1 06 3
]
]
Rminus1119862
= [
[
07286 minus08572 minus00714
minus08571 14286 00
minus00714 00 03571
]
]
(20)
The variance factors are listed in (4) where for exampleVF119888(1205732) = 0388903571 = 10889 As distinct from conven-
tional VIF119888s for 120573
1and 120573
2only our VF
119888(1205730) here reflects
Variance Deflationwherein 1205730is estimated more precisely in
the initial 119862perp-nonorthogonal designA further note on orthogonality is germane Suppose
that the actual experiment is 119863(0) yet towards a thoroughdiagnosis the user evaluates the conventional VIF
119888(1205731) =
12250 = VIF119888(1205732) as ratios of variances of 119863(0) to 119863(06)
from Table 2 Unfortunately their meaning is obscured sincea design cannot at once be 119881perp and 119862
perp-orthogonalAs in Remark 2 we next alter the basic designX
0at (1) on
shifting columns of themeasurements to [0 0] asminima andscaling to have squared lengths equal to 119899 = 5 The resultingX0follows directly from (1) The new matrix U = X
1015840
0X0 andits inverse V are
U = [
[
5 42426 42426
42426 5 4
42426 4 5
]
]
V = [
[
1 minus04714 minus04714
minus04714 07778 minus02222
minus04714 minus02222 07778
]
]
(21)
giving conventional VIF119906s as [5 38889 38889] Against 119862perp-
orthogonality this gives the diagnostics
[VF119888(1205730) VF119888(1205731) VF119888(1205732)] = [08140 10889 10889]
(22)
demonstrating in comparison with (4) that VF119888s apart from
1205730 are invariant under translation and scaling a consequence
of Lemma 14We further seek to compare variances for the shifted
and scaled (X1X2) against 119881perp-orthogonality as Reference
However from U = X10158400X0at (21) we determine that 119883
1=
084853 = 1198832and 5[(119883
2
15) + (119883
2
25)] = 14400 Since
the latter exceeds unity we infer from Lemma 9(i) that
Advances in Decision Sciences 9
Table 2 Designs 119863(119909) 119909 isin [minus05 00 05 06 10 15] obtained by substitutingX01= [11988311 11988312 11988313 11988314 11988315]1015840 in [15 ∙X2] of order (5times3)
and corresponding variances Var (1205730)Var (120573
1)Var (120573
2)
Design 11988311
11988312
11988313
11988314
11988315
Var (1205730) Var (120573
1) Var (120573
2)
119863(minus05) 10 00 05 10 05 19333 37333 09333119863(00) 10 05 05 10 00 09375 17500 04375119863(05) 05 10 00 10 05 07436 14359 03590119863(06) 06091 05793 06298 11819 0000 07286 14286 03570119863(10) 00 05 05 10 10 07222 15556 03889119863(15) 00 10 05 05 10 09130 24348 06087
119881perp-orthogonality is incompatible with this configuration
of the data so that the VF119907s are undefined Equivalently
on evaluating 1206011
= 1206012
= 2290 deg and 1206011+ 1206012
=
4580 lt 90 deg Corollary 10 asserts that R119881is not positive
definite This appears to be anomalous until we recallthat 119881perp-orthogonality is invariant under rescaling but notrecentering the regressors
54 A Critique We reiterate apparent ambiguity ascribedin the literature to orthogonality A number of widely heldguiding precepts has evolved as enumerated in part in ouropening paragraph and in Section 3 As in Remark 3 theseclearly have been taken to apply verbatim forX
0andX1015840
0X0in
1198720 to be paraphrased in part as follows(P1) Ill-conditioning espouses inflated variances that is
VIFs necessarily equal or exceed unity(P2) 119862perp-orthogonal designs are ldquoidealrdquo in that VIF
119888s for
such designs are all unity see [11] for exampleWe next reassess these precepts as they play out under119881perp and 119862
perp orthogonality in connection with Table 2(C1) For models in 119872
0having X = 0 we reiterate that
the uncentered VIF119906s at expression (3) namely
[36111 38889 11667] overstate adverse effects onvariances of the 119881perp-nonorthogonal array since X1015840
0X0
cannot be diagonal Moreover these conventionalvalues fail to discern that the revised values for VF
119907s
at (5) namely [07704 08889 08889] reflect that[1205730 1205731 1205732] are estimated with greater efficiency in the
119881perp-nonorthogonal array119863(1)
(C2) For 119881perp-orthogonality the claim P1 thus is false by
counterexampleTheVFs at (5) areVarianceDeflationFactors for design 119863(1) where X1015840
1X2= 1 relative to
the 119881perp-orthogonal119863(0)(C3) Additionally the variances Var[120573
0(119909)] Var[120573
1(119909)]
Var[1205732(119909)] in Table 2 are all seen to decrease from
119863(0) [09375 17500 04375] rarr 119863(1) [0722215556 03889] despite the transition from the 119881
perp-orthogonal 119863(0) to the 119881
perp-nonorthogonal 119863(1)Similar trends are confirmed in other ill-conditioneddata sets from the literature
(C4) For 119862perp-orthogonal designs the claim P2 is false by
counterexample Such designs need not be ldquoidealrdquoas 1205730may be estimated more efficiently in a 119862
perp-nonorthogonal design as demonstrated by the VF
119888s
for 1205730at (4) and (22) Similar trends are confirmed
elsewhere as seen subsequently
(C5) VFs for 1205730are critical in prediction where prediction
variances necessarily depend on Var(1205730) especially
for predicting near the origin of the system of coor-dinates
(C6) Dissonance between 119881perp and 119862
perp is seen in Table 2where 119863(06) as 119862
perp-orthogonal with X10158401X2
= 06is the antithesis of 119881perp-orthogonality at 119863(0) whereX10158401X2= 0
(C7) In short these transparent examples serve to dispelthe decades-old mantra that ill-conditioning neces-sarily spawns inflated variances formodels in119872
0 and
they serve to illuminate the contributing structures
6 Orthogonal and Linked Arrays
A genuine 119881perp-orthogonal array was generated as eigenvec-
tors E = [E1E2 E
8] from a positive definite (8 times 8)
matrix (see Table 83 of [11 p377]) using PROC IML of theSAS programming package The columns [X
1X2X3] as the
second through fourth columns of Table 3 comprise the firstthree eigenvectors scaled to length 8 These apply for modelsin 119872119908and 119872
0to be analyzed In addition linked vectors
(Z1Z2Z3) were constructed as Z
1= 119888(09E
1+ 01E
2)
Z2
= 119888(09E2+ 01E
3) and Z
3= 119888(09E
3+ 01E
4) where
119888 = radic8082 Clearly these arrays as listed in Table 3 are notarchaic abstractions as both are amenable to experimentalimplementation
61 The Model 1198720 We consider in turn the orthogonal and
the linked series
611 Orthogonal Data Matrices X10158400X0and (X1015840
0X0)minus1 for the
orthogonal data under model1198720are listed in Table 4 where
variances occupy diagonals of (X10158400X0)minus1 The conventional
uncentered VIF119906s are [120296 102144 112256 105888]
Since X = 0 we find the angle between the constant andthe span of the regressor vectors to be 120579
0= 65748 deg as
in Theorem 4(ii) Moreover the angle between X1and the
span of [1119899X2X3] namely 120579
1= 81670 deg is not 90 deg
because of collinearity with the constant vector despite themutual orthogonality of [X
1X2X3] Observe here thatX1015840
0X0
10 Advances in Decision Sciences
Table 3 Basic 119881perp-orthogonal design X0 = [18 X1 X2 X3] and a linked design Z0 = [18 Z1 Z2 Z3] of order (8 times 4)
Orthogonal (X1X2X3) Linked (Z
1Z2Z3)
18
X1
X2
X3
18
Z1
Z2
Z3
1 minus1084470 0056899 0541778 1 minus1071550 0116381 06534861 minus1090010 minus0040490 0117026 1 minus1087820 minus0027320 01160861 0979050 0002564 2337454 1 0973345 0260677 21996871 minus1090230 0016839 0170711 1 minus1081700 0035588 00706941 0127352 2821941 minus0071670 1 0438204 2796767 minus00707101 1045409 minus0122670 minus1476870 1 1025468 minus0285010 minus16185601 1090803 minus0088970 0043354 1 1074306 minus0083640 01769281 1090739 minus0092300 0108637 1 1073875 minus0079740 0244567
Table 4 Matrices X10158400X0 and (X10158400X0)minus1 for the orthogonal data under model119872
0
X10158400X0
(X10158400X0)minus1
8 10686 25538 17704 01504 minus00201 minus00480 minus0033310686 8 0 0 minus00201 01277 00064 0004425538 0 8 0 minus00480 00064 01403 0010617704 0 0 8 minus00333 00044 00106 01324
already is 119881perp-orthogonal accordingly R119881= X10158400X0 and thus
the VF119907s are all unity
In view of dissonance between 119881perp and 119862
perp orthogonalitythe sums of squares and products for the mean-centered Xare
X1015840B119899X = [
[
78572 minus03411 minus02365
minus03411 71847 minus05652
minus02365 minus05652 76082
]
]
(23)
reflecting nonnegligible negative dependencies to distin-guish the 119881
perp-orthogonal X from the 119862perp-nonorthogonal
matrix Z = B119899X of deviations
To continue we suppose instead that the 119881perp-orthogonalmodel was to be recast as 119862
perp-orthogonal The Referencemodel and its inverse are listed in Table 5 Variance factorsas ratios of diagonal elements of (X1015840
0X0)minus1 in Table 4 to those
of Rminus1119862
in Table 5 are
[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]
= [10168 10032 10082 10070]
(24)
reflecting negligible differences in precision This parallelsresults reported in Table 2 comparing 119863(0) to 119863(06) asReference except for larger differences in Table 2 namely[128681225012250]
612 Linked Data For the Linked Data under model 1198720
the matrix Z10158400Z0 follows routinely from Table 3 its inverse(Z10158400Z0)minus1 is listed in Table 6 The conventional uncentered
VIF119906s are now [120320 103408 113760 105480] These
again fail to gage variance inflation since Z10158400Z0cannot be
diagonal From (10) the principal angle between the constantvector and the span of the regressor vectors is 120579
0= 65735
degrees
Remark 15 Observe that the choice for R119862may be posed as
seeking R119862
= U(119909) such that Rminus1119862(2 3) = 00 then solving
numerically for 119909 using Maple for example However thealgorithm in Definition 12(ii) affords a direct solution thatC119908should be diagonal stipulates its off-diagonal element as
X10158401X2minus 35 = 0 so that X1015840
1X2= 06 in R
119862at expression (20)
To illustrate Theorem 4(iii) we compute cos(120579) = (1 minus
110889)12 = 02857 and 120579 = 73398 deg as the angle between
the vectors [X1X2] of (1) when centered to their means For
slopes in 1198720 [9] shows that VIF
119888(120573119895) lt VIF
119906(120573119895) 1 le
119894 le 119896 This follows since their numerators are equal butdenominators are reciprocals of lengths of the centered anduncentered regressor vectors To illustrate numerators areequal for VIF
119906(1205733) = 03889(13) = 11667 and for
VIF119888(1205733) = 03889(128) = 10889 but denominators are
reciprocals of X10158402X2= 3 and sum
5
119895=1(1198832119895
minus 1198832)2= 28
(119894) 119881perp Reference Model To rectify this malapropism we seek
in Definition 8 a model R119881as Reference This is found on
setting all off-diagonal elements of Z10158400Z0to zero excluding
the first row and column Its inverse Rminus1119881
is listed in Table 6The VF
119907s are ratios of diagonal elements of (Z1015840
0Z0)minus1 on the
left to diagonal elements of Rminus1119881
on the right namely
[VF119907(1205730) VF119907(1205731) VF119907(1205732) VF119907(1205733)]
= [09697 09991 09936 09943]
(25)
Against conventional wisdom it is counter-intuitive that[1205730 1205731 1205732 1205733] are all estimated with greater precision in the
119881perp-nonorthogonal model Z1015840
0Z0than in the 119881
perp-orthogonalmodel R
119881of Definition 8 As in Section 54 this serves again
to refute the tenet that ill-conditioning espouses inflated
Advances in Decision Sciences 11
Table 5 Matrices R119862and Rminus1
119862for checking 119862
perp-orthogonality in the orthogonal data X0 under model1198720
R119862
Rminus1119862
8 10686 25538 17704 01479 minus00170 minus00444 minus0029110686 8 03411 02365 minus00170 01273 0 025538 03411 8 05652 minus00444 0 01392 017704 02365 05652 8 minus00291 0 0 01314
Table 6 Matrices (Z10158400Z0)minus1 and Rminus1
119881for checking 119881
perp-orthogonality for the linked data under model1198720
(Z10158400Z0)minus1 Rminus1
119881
01504 minus00202 minus00461 minus00283 01551 minus00261 minus00530 minus00344minus00202 01293 minus000797 00053 minus00261 01294 00089 00058minus00461 minus00079 01422 minus00054 minus00530 00089 01431 00117minus00283 00053 minus00054 01319 minus00344 00058 00117 01326
variances for models in1198720 that is that VFs necessarily equal
or exceed unity(119894119894) 119862
perp Reference Model Further consider variances werethe Linked Data to be centered As in Definition 12(ii)we seek a Reference model R
119862such that C
119908= (W minus
M) is diagonal The required R119862
and its inverse arelisted in Table 7 Since variances in the Linked Data are[015040 012926 014220 013185] from Table 6 and Refer-ence values appear as diagonal elements of Rminus1
119862on the right
in Table 7 their ratios give the centered VF119888s as
[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]
= [09920 10049 10047 10030]
(26)
We infer that changes in precision would be negligible ifinstead the Linked Data experiment was to be recast as a 119862perp-orthogonal experiment
As in Remark 15 the reference matrix R119862is constrained
by the first and second moments from X10158400X0and has free
parameters 1199091 1199092 1199093 for the (2 3) (2 4) (3 4) entries
The constraints for 119862perp-orthogonality are Rminus1
119862(2 3) =
Rminus1119862(2 4) = Rminus1
119862(3 4) = 0 Although this system of
equations can be solved with numerical software such asMaple the algorithm stated in Definition 12 easily yields thedirect solution given here
62 The Model 119872119908 For the Linked Data take 119884
119894= 12057311198851198941+
12057321198851198942
+ 12057331198851198943
+ 120598119894 1 le 119894 le 8 as Y = Z120573 + 120598
where the expected response at 1198851
= 1198852
= 1198853
= 0
is 119864(119884) = 00 and there is no occasion to translate theregressorsThe lower right (3 times 3) submatrix ofZ1015840
0Z0(4 times 4)
from 1198720is Z1015840Z its inverse is listed in Table 8 The ratios
of diagonal elements of (Z1015840Z)minus1 to reciprocals of diagonalelements of Z1015840Z are the conventional uncentered VIF
119906s
namely [10123 10247 10123] Equivalently scaling Z1015840Z tohave unit diagonals and inverting gives Cminus1 as in Table 8 itsdiagonal elements are the VIF
119906sThroughout Section 6 these
comprise the only correct interpretation of conventionalVIF119906s as genuine variance ratios
7 Case Study Body Fat Data
71 The Setting Body fat and morphogenic measurementswere reported for 119899 = 20 human subjects in Table D11 of[29 p717] Response and regressor variables are119884 amount ofbody fat X
1 triceps skinfold thickness X
2 thigh circumfer-
ence and X3 mid-arm circumference under119872
0 119884119894= 1205730+
12057311198831+12057321198832+12057331198833+ 120598119894 From the original data as reported
the condition number of X10158400X0is 1039931 and the VIF
119906s are
[2714800 4469419 630998 778703] On scaling columnsof X to equal column lengths X
119894 = 50 119894 = 1 2 3 the
resulting condition number of the revisedX10158400X0is 2 9696518
and the uncenteredVIF119906s are as before from invariance under
scalingAgainst complaints that regressors are remote from their
natural origins we center regressors to [0 0 0] as origin onsubtracting the minimal element from each column as inRemark 2 and scaling these to X
119894 = 50 119894 = 1 2 3
This is motivated on grounds that centered diagnostics apartfrom VIF
119888(1205730) are invariant under shifts and scalings from
Lemma 14 In addition under the new origin [0 0 0] 1205730
assumes prominence as the baseline response against whichchanges due to regressors are to be gaged The original dataare available on the publisherrsquos web page for [29]
Taking these shifted and scaled vectors as columnsof X = [X
1X2X3] in the revised X
0= [1
119899X]
values for X10158400X0
and its inverse are listed in Table 9The condition number of X1015840
0X0
is now 1136969 andits VIF
119906s are [6775617998742782174484] Additionally
from Theorem 4(ii) we recover the angle 1205790= 22592 deg to
quantify collinearity of regressors with the constant vectorIn addition the scaled and centered correlationmatrix for
X is
R = [
[
10 00847 08781
00847 10 01424
08781 01424 10
]
]
(27)
indicating strong linkage between mean-centered versions of(X1X3)Moreover the experimental design is neither119881perp nor
119862perp-orthogonal since neither the lower right (3 times 3) block of
X10158400X0nor the centered matrixC = (X1015840Xminus119899XX1015840) is diagonal
12 Advances in Decision Sciences
Table 7 Matrices R119862and Rminus1
119862for checking 119862
perp-orthogonality in the linked data under model1198720
R119862
Rminus1119862
8 13441 27337 17722 01516 minus00216 minus00484 minus0029113441 8 04593 02977 minus00216 01286 0 027337 04593 8 06056 minus00484 0 01415 017722 02977 06056 8 minus00291 0 0 01314
Table 8 Matrices (Z1015840Z)minus1 and Cminus1 for the linked data under model119872119908
(Z1015840Z)minus1 Cminus1
01265 minus00141 00015 10123 minus01125 00123minus00141 01281 minus00141 minus01125 10247 minus0112500015 minus00141 01265 00123 minus01125 10123
(119894) 119881perp Reference Model To compare this with a 119881
perp-or-thogonal design we turn again to Definition 8 Howeverevaluating the test criterion in Lemma 9(119894) shows that thisconfiguration is not commensurate with 119881
perp-orthogonalityso that the VF
119907s are undefined Comparisons with 119862
perp-orthogonal models are addressed next(119894119894) 119862perp Reference Model As in Definition 12(ii) the centering
matrix is
119899XX1015840 = [
[
188889 189402 187499
189402 189916 188007
187499 188007 186118
]
]
(28)
To check against 119862perp-orthogonality we obtain the matrix Wof Definition 12(ii) on replacing off-diagonal elements of thelower right (3 times 3) submatrix ofX1015840
0X0by corresponding off-
diagonal elements of 119899XX1015840 The result is the matrix R119862and
its inverse as given in Table 10 where diagonal elements ofRminus1119862
are the Reference variances The VF119888s found as ratios
of diagonal elements of (X10158400X0)minus1 relative to those of Rminus1
119862
are listed in Table 12 under G = [0 0 0] the indicator ofDefinition 12(i) for this case
72 Variance Factors and Linkage Traditional gages of ill-conditioning are patently absurd on occasion By conventionVIFs are ldquoall or nonerdquo wherein Reference models entailstrictly diagonal components In practice some regressorsare inextricably linked to get or even visualize orthogonalregressors may go beyond feasible experimental rangesrequire extrapolation beyond practical limits and challengecredulity Response functions so constrained are described in[30] as ldquopicket fencesrdquo In such cases taking C to be diagonalas its 119862perp Reference is moot at best an academic folly abettedin turn by default in standard software packages In shortconventional diagnostics here offer answers to irrelevantquestions Given pairs of regressors irrevocably linked weseek instead to assess effects on variances for other pairsthat could be unlinked by design These comments applyespecially to entries under G = [0 0 0] in Table 12
To proceed we define essential ill-conditioning as regres-sors inextricably linked necessarily to remain so and asnonessential if regressors could be unlinked by design As a
followup to Section 32 for X = 0 we infer that the constantvector is inextricably linked with regressors thus accountingfor essential ill-conditioning not removed by centeringcontrary to claims in [4] Indeed this is the essence ofDefinition 8
Fortunately these limitations may be remedied throughReference models adapted to this purpose This in turnexploits the special notation and trappings of Definition 12(i)as follows In addition toG = [0 0 0] we iterate for indicatorsG = [119892
12 11989213 11989223] taking values in [1 0 0] [0 1 0] [0 0 1]
[1 1 0] [1 0 1] [0 1 1] Values of VF119888s thus obtained are
listed in Table 12 To fix ideas the Reference R119862for G =
[1 0 0] that is for the constraint R119862(2 3) = X1015840
0X0(2 3) =
194533 is given in Table 11 together with its inverse Foundas ratios of diagonal elements of (X1015840
0X0)minus1 relative to those of
Rminus1119862
in Table 11 VF119888s are listed in Table 12 underG = [1 0 0]
In short R119862is ldquoas 119862
perp-orthogonal as it could berdquo given theconstraint Other VF
119888s in Table 12 proceed similarly For all
cases in Table 12 excluding G = [0 1 1] 1205730is estimated
with greater efficiency in the original model than any of thepostulated Reference models
As in Remark 15 the reference matrix R119862
for G =
[1 0 0] is constrained by X10158401X2
= 194533 to be retainedwhereas X1015840
1X3
= 1199091and X1015840
2X3
= 1199092are to be determined
from Rminus1119862(2 4) = Rminus1
119862(3 4) = 0 Although this system
can be solved numerically as noted the algorithm stated inDefinition 12 easily yields the direct solution given here
Recall 1198831 triceps skinfold thickness 119883
2 thigh cir-
cumference and 1198833 mid-arm circumference and from
(27) that (1198831 1198833) are strongly linked Invoking the cutoff
rule [24] for VIFs in excess of 40 it is seen for G =
[0 0 0] in Table 12 that VF119888(1205731) and VF
119888(1205733) are excessive
where [X1X2X3] are forced to be mutually uncorrelated
as Reference Remarkably similar values are reported forG isin [1 0 0] [ 0 0 1] [1 0 1] all gaged against intractableReference models wherein (119883
1 1198833) are postulated to be
uncorrelated however incrediblyOn the other hand VF
119888s for G isin [0 1 0] [1 1 0]
[0 1 1] in Table 12 are likewise comparable where (X1X3)
are allowed to remain linked at X10158401X3= 242362 Here the
VF119888s reflect negligible changes in efficiency of the estimates
Advances in Decision Sciences 13
Table 9 Matrices X10158400X0 and (X10158400X0)minus1 for the body fat data under model119872
0centered to minima and scaled to equal lengths
X10158400X0
(X10158400X0)minus1
20 194365 194893 192934 03388 minus01284 minus01482 minus00203194365 25 194533 242362 minus01284 07199 00299 minus06224194893 194533 25 196832 minus01482 00299 01711 minus00494192934 242362 196832 25 minus00203 minus06224 minus00494 06979
Table 10 Matrices R119862and Rminus1
119862for the body fat data adjusted to give off-diagonal elements the value 00 with code G = [0 0 0]
R119862
Rminus1119862
20 194365 194893 192934 05083 minus01590 minus01622 minus01510194365 25 189402 187499 minus01590 01636 0 0194893 189402 25 188007 minus01622 0 01664 0192934 187499 188007 25 minus01510 0 0 01565
in comparison with the originalX10158400X0 where [X
1X2X3] are
all linked In summary negligible changes in overall efficiencywould accrue on recasting the experiment so that (X
1X2)
and (X2X3) were pairwise uncorrelated
Table 12 conveys further useful information on notingthat VFs as ratios are multiplicative and divisible Forexample take the model codedG = [1 1 0] whereX1015840
1X2and
X10158401X3are retained at their initial values under X1015840
0X0 namely
194533 and 242362 from Table 9 with (X2X3) unlinked
Now taking G = [0 1 0] as reference we extract the VFson dividing elements in Table 12 at G = [0 1 0] by those ofG = [1 1 0] The result is
[VF (1205730) VF (120573
1) VF (120573
2) VF (120573
3)]
= [09196
0984810073
1005710282
1020810208
10208]
= [09338 10017 10072 10000]
(29)
8 Conclusions
Our goal is clarity in the actual workings of VIFs as ill-conditioning diagnostics thus to identify and rectifymiscon-ceptions regarding the use of VIFs for models X
0= [1119899X]
with intercept Would-be arbiters for ldquocorrectrdquo usage havedivided between VIF
119888s for mean-centered regressors and
VIF119906s for uncentered data We take issue with both Conven-
tional but clouded vision holds that (i) VIF119906s gage variance
inflation owing to nonorthogonal columns of X0as vectors
in R119899 (ii) VIFs ge 10 and (iii) models having orthogonalmean-centered regressors are ldquoidealrdquo in possessing VIF
119888s of
value unity Reprising Remark 3 these properties widely andcorrectly understood for models in119872
119908 appear to have been
expropriated without verification for models in1198720hence the
anomaliesAccordingly we have distinguished vector-space orthog-
onality (119881perp) of columns of X0 in contrast to orthogonality
(119862perp) of mean-centered columns of X A key feature is theconstruction of second-moment arrays as Reference models(R119881R119862) encoding orthogonalities of these types and their
Variance Factors as VF119907rsquos and VF
119888rsquos respectively Contrary
to convention we demonstrate analytically and through casestudies that (i1015840) VIF
119906s do not gage variance inflation (ii1015840)
VFs ge 10 need not hold whereas VF lt 10 representsVariance Deflation and (iii1015840) models having orthogonalcentered regressors are not necessarily ldquoidealrdquo In particularvariance deflation occurs when ill-conditioned data yieldsmaller variances than corresponding orthogonal surrogates
In short our studies call out to adjudicate the prolongeddispute regarding centered and uncentered diagnostics formodels in119872
0 Our summary conclusions follow
(S1) The choice of a model with or without intercept isa substantive matter in the context of each exper-imental paradigm That choice is fixed beforehandin a particular setting is structural and is beyondthe scope of the present study Snee and Marquardt[10] correctly assert that VIFs are fully creditedand remain undisputed in models without interceptthese quantities unambiguously retain the meaningsoriginally intended namely as ratios of variances togage effects of nonorthogonal regressors For modelswith intercept the choice is between centered anduncentered diagnostics which is to be weighed asfollows
(S2) ConventionalVIF119888s apply incompletely to slopes only
not necessarily grasping fully that a system is illconditioned Of particular concern is missing evi-dence on collinearity of regressors with the interceptOn the other hand if 119862
perp-orthogonality pertainsthen VF
119888s may supplant VIF
119888s as applicable to all of
[1205730 1205731 120573
119896] Moreover the latter may reveal that
VF119888(1205730) lt 10 as in cited examples and of special
import in predicting near the origin of the coordinatesystem
(S3) Our VF119907s capture correctly the concept apparently
intended by conventional VIF119906s Specifically if 119881perp-
orthogonality pertains then VF119907s as genuine ratios
of variances may supplant VIF119906s as now discredited
gages for variance inflation(S4) Returning to Section 32 we subscribe fully but for
different reasons to Belsleyrsquos and othersrsquo contention
14 Advances in Decision Sciences
Table 11 Matrices R119862and Rminus1
119862for the body fat data adjusted to give off-diagonal elements the value 00 except X10158401X2 code G = [1 0 0]
R119862
Rminus1119862
20 194365 194893 192934 04839 minus01465 minus01497 minus01510194365 25 194533 187499 minus01465 01648 minus00141 0194893 194533 25 188007 minus01497 minus00141 01676 0192934 187499 188007 25 minus01510 0 0 01565
Table 12 Summary VF119888s for the body fat data centered to minima and scaled to common lengths identified with varying G = [119892
12 11989213 11989223]
from Definition 12
Elements of G VF119888s
11989212
11989213
11989223
1205730
1205731
1205732
1205733
0 0 0 06665 43996 10282 44586
1 0 0 07002 43681 10208 44586
0 1 0 09196 10073 10282 10208
0 0 1 07201 43996 10073 43681
1 1 0 09848 10057 10208 10208
1 0 1 07595 43681 10003 43681
0 1 1 10248 10073 10073 10160
that uncentered VIF119906s are essential in assessing ill-
conditioned data In the geometry of ill-conditioningthese should be retained in the pantheon of regres-sion diagnostics not as now debunked gages ofvariance inflation but as more compelling angularmeasures of the collinearity of each column of X
0=
[1119899X1X2 X
119896] with the subspace spanned by
remaining columns(S5) Specifically the degree of collinearity of regressors
with the intercept is quantified by 1205790deg which
if small is known to ldquocorrupt the estimates of allparameters in the model whether or not the interceptis itself of interest and whether or not the data havebeen centeredrdquo [21 p90]This in turnwould appear toelevate 120579
0in importance as a critical ill-conditioning
diagnostic(S6) In contrast Theorem 4(iii) identifies conventional
VIF119888s with angles between a centered regressor and
the subspace spanned by remaining centered regres-sors For example
1205791= arccos([
[
1 minus1
VIFc ( ldquo1205731)]
]
12
) (30)
is the angle between the centered vector Z1= B119899X1
and the remaining centered vectors These lie in thelinear subspace of R119899 comprising the orthogonalcomplement to 1
119899isin R119899 and thus bear on the
geometry of ill-conditioning in this subspace(S7) Whereas conventional VIFs are ldquoall or nonerdquo to
be gaged against strictly diagonal components ourReference models enable unlinking what may beunlinked while leaving other pairs of regressorslinked This enables us to assess effects on variances
attributed to allowable partial dependencies amongsome but not other regressors
In conclusion the foregoing summary points to limita-tions in modern software packages Minitab v15 119878119875119878119878 v19and 119878119860119878 v92 are standard software packages which computecollinearity diagnostics returning with the default optionsthe centered VIFs VIF
119888(120573119895) 1 le 119895 le 119896 To compute
the uncentered VIF119906s one needs to define a variable for
the constant term and perform a linear regression using theoptions of fitting without an intercept On the other handcomputations for VF
119888s and VF
119907s as introduced here require
additional if straightforward programming for exampleMaple and PROC IML of the SAS System as utilized here
References
[1] D A Belsley ldquoDemeaning conditioning diagnostics throughcenteringrdquo The American Statistician vol 38 no 2 pp 73ndash771984
[2] E R Mansfield and B P Helms ldquoDetecting multicollinearityrdquoThe American Statistician vol 36 no 3 pp 158ndash160 1982
[3] S D Simon and J P Lesage ldquoThe impact of collinearityinvolving the intercept term on the numerical accuracy ofregressionrdquo Computer Science in Economics and Managementvol 1 no 2 pp 137ndash152 1988
[4] D W Marquardt and R D Snee ldquoRidge regression in practicerdquoThe American Statistician vol 29 no 1 pp 3ndash20 1975
[5] K N Berk ldquoTolerance and condition in regression computa-tionsrdquo Journal of the American Statistical Association vol 72 no360 part 1 pp 863ndash866 1977
[6] A E Beaton D Rubin and J Barone ldquoThe acceptability ofregression solutions another look at computational accuracyrdquoJournal of the American Statistical Association vol 71 no 353pp 158ndash168 1976
Advances in Decision Sciences 15
[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975
[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970
[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987
[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984
[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990
[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980
[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969
[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996
[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003
[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006
[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008
[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010
[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981
[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983
[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987
[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986
[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980
[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007
[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984
[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969
[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997
[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000
[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003
[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983
Advances in Decision Sciences 9
Table 2 Designs 119863(119909) 119909 isin [minus05 00 05 06 10 15] obtained by substitutingX01= [11988311 11988312 11988313 11988314 11988315]1015840 in [15 ∙X2] of order (5times3)
and corresponding variances Var (1205730)Var (120573
1)Var (120573
2)
Design 11988311
11988312
11988313
11988314
11988315
Var (1205730) Var (120573
1) Var (120573
2)
119863(minus05) 10 00 05 10 05 19333 37333 09333119863(00) 10 05 05 10 00 09375 17500 04375119863(05) 05 10 00 10 05 07436 14359 03590119863(06) 06091 05793 06298 11819 0000 07286 14286 03570119863(10) 00 05 05 10 10 07222 15556 03889119863(15) 00 10 05 05 10 09130 24348 06087
119881perp-orthogonality is incompatible with this configuration
of the data so that the VF119907s are undefined Equivalently
on evaluating 1206011
= 1206012
= 2290 deg and 1206011+ 1206012
=
4580 lt 90 deg Corollary 10 asserts that R119881is not positive
definite This appears to be anomalous until we recallthat 119881perp-orthogonality is invariant under rescaling but notrecentering the regressors
54 A Critique We reiterate apparent ambiguity ascribedin the literature to orthogonality A number of widely heldguiding precepts has evolved as enumerated in part in ouropening paragraph and in Section 3 As in Remark 3 theseclearly have been taken to apply verbatim forX
0andX1015840
0X0in
1198720 to be paraphrased in part as follows(P1) Ill-conditioning espouses inflated variances that is
VIFs necessarily equal or exceed unity(P2) 119862perp-orthogonal designs are ldquoidealrdquo in that VIF
119888s for
such designs are all unity see [11] for exampleWe next reassess these precepts as they play out under119881perp and 119862
perp orthogonality in connection with Table 2(C1) For models in 119872
0having X = 0 we reiterate that
the uncentered VIF119906s at expression (3) namely
[36111 38889 11667] overstate adverse effects onvariances of the 119881perp-nonorthogonal array since X1015840
0X0
cannot be diagonal Moreover these conventionalvalues fail to discern that the revised values for VF
119907s
at (5) namely [07704 08889 08889] reflect that[1205730 1205731 1205732] are estimated with greater efficiency in the
119881perp-nonorthogonal array119863(1)
(C2) For 119881perp-orthogonality the claim P1 thus is false by
counterexampleTheVFs at (5) areVarianceDeflationFactors for design 119863(1) where X1015840
1X2= 1 relative to
the 119881perp-orthogonal119863(0)(C3) Additionally the variances Var[120573
0(119909)] Var[120573
1(119909)]
Var[1205732(119909)] in Table 2 are all seen to decrease from
119863(0) [09375 17500 04375] rarr 119863(1) [0722215556 03889] despite the transition from the 119881
perp-orthogonal 119863(0) to the 119881
perp-nonorthogonal 119863(1)Similar trends are confirmed in other ill-conditioneddata sets from the literature
(C4) For 119862perp-orthogonal designs the claim P2 is false by
counterexample Such designs need not be ldquoidealrdquoas 1205730may be estimated more efficiently in a 119862
perp-nonorthogonal design as demonstrated by the VF
119888s
for 1205730at (4) and (22) Similar trends are confirmed
elsewhere as seen subsequently
(C5) VFs for 1205730are critical in prediction where prediction
variances necessarily depend on Var(1205730) especially
for predicting near the origin of the system of coor-dinates
(C6) Dissonance between 119881perp and 119862
perp is seen in Table 2where 119863(06) as 119862
perp-orthogonal with X10158401X2
= 06is the antithesis of 119881perp-orthogonality at 119863(0) whereX10158401X2= 0
(C7) In short these transparent examples serve to dispelthe decades-old mantra that ill-conditioning neces-sarily spawns inflated variances formodels in119872
0 and
they serve to illuminate the contributing structures
6 Orthogonal and Linked Arrays
A genuine 119881perp-orthogonal array was generated as eigenvec-
tors E = [E1E2 E
8] from a positive definite (8 times 8)
matrix (see Table 83 of [11 p377]) using PROC IML of theSAS programming package The columns [X
1X2X3] as the
second through fourth columns of Table 3 comprise the firstthree eigenvectors scaled to length 8 These apply for modelsin 119872119908and 119872
0to be analyzed In addition linked vectors
(Z1Z2Z3) were constructed as Z
1= 119888(09E
1+ 01E
2)
Z2
= 119888(09E2+ 01E
3) and Z
3= 119888(09E
3+ 01E
4) where
119888 = radic8082 Clearly these arrays as listed in Table 3 are notarchaic abstractions as both are amenable to experimentalimplementation
61 The Model 1198720 We consider in turn the orthogonal and
the linked series
611 Orthogonal Data Matrices X10158400X0and (X1015840
0X0)minus1 for the
orthogonal data under model1198720are listed in Table 4 where
variances occupy diagonals of (X10158400X0)minus1 The conventional
uncentered VIF119906s are [120296 102144 112256 105888]
Since X = 0 we find the angle between the constant andthe span of the regressor vectors to be 120579
0= 65748 deg as
in Theorem 4(ii) Moreover the angle between X1and the
span of [1119899X2X3] namely 120579
1= 81670 deg is not 90 deg
because of collinearity with the constant vector despite themutual orthogonality of [X
1X2X3] Observe here thatX1015840
0X0
10 Advances in Decision Sciences
Table 3 Basic 119881perp-orthogonal design X0 = [18 X1 X2 X3] and a linked design Z0 = [18 Z1 Z2 Z3] of order (8 times 4)
Orthogonal (X1X2X3) Linked (Z
1Z2Z3)
18
X1
X2
X3
18
Z1
Z2
Z3
1 minus1084470 0056899 0541778 1 minus1071550 0116381 06534861 minus1090010 minus0040490 0117026 1 minus1087820 minus0027320 01160861 0979050 0002564 2337454 1 0973345 0260677 21996871 minus1090230 0016839 0170711 1 minus1081700 0035588 00706941 0127352 2821941 minus0071670 1 0438204 2796767 minus00707101 1045409 minus0122670 minus1476870 1 1025468 minus0285010 minus16185601 1090803 minus0088970 0043354 1 1074306 minus0083640 01769281 1090739 minus0092300 0108637 1 1073875 minus0079740 0244567
Table 4 Matrices X10158400X0 and (X10158400X0)minus1 for the orthogonal data under model119872
0
X10158400X0
(X10158400X0)minus1
8 10686 25538 17704 01504 minus00201 minus00480 minus0033310686 8 0 0 minus00201 01277 00064 0004425538 0 8 0 minus00480 00064 01403 0010617704 0 0 8 minus00333 00044 00106 01324
already is 119881perp-orthogonal accordingly R119881= X10158400X0 and thus
the VF119907s are all unity
In view of dissonance between 119881perp and 119862
perp orthogonalitythe sums of squares and products for the mean-centered Xare
X1015840B119899X = [
[
78572 minus03411 minus02365
minus03411 71847 minus05652
minus02365 minus05652 76082
]
]
(23)
reflecting nonnegligible negative dependencies to distin-guish the 119881
perp-orthogonal X from the 119862perp-nonorthogonal
matrix Z = B119899X of deviations
To continue we suppose instead that the 119881perp-orthogonalmodel was to be recast as 119862
perp-orthogonal The Referencemodel and its inverse are listed in Table 5 Variance factorsas ratios of diagonal elements of (X1015840
0X0)minus1 in Table 4 to those
of Rminus1119862
in Table 5 are
[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]
= [10168 10032 10082 10070]
(24)
reflecting negligible differences in precision This parallelsresults reported in Table 2 comparing 119863(0) to 119863(06) asReference except for larger differences in Table 2 namely[128681225012250]
612 Linked Data For the Linked Data under model 1198720
the matrix Z10158400Z0 follows routinely from Table 3 its inverse(Z10158400Z0)minus1 is listed in Table 6 The conventional uncentered
VIF119906s are now [120320 103408 113760 105480] These
again fail to gage variance inflation since Z10158400Z0cannot be
diagonal From (10) the principal angle between the constantvector and the span of the regressor vectors is 120579
0= 65735
degrees
Remark 15 Observe that the choice for R119862may be posed as
seeking R119862
= U(119909) such that Rminus1119862(2 3) = 00 then solving
numerically for 119909 using Maple for example However thealgorithm in Definition 12(ii) affords a direct solution thatC119908should be diagonal stipulates its off-diagonal element as
X10158401X2minus 35 = 0 so that X1015840
1X2= 06 in R
119862at expression (20)
To illustrate Theorem 4(iii) we compute cos(120579) = (1 minus
110889)12 = 02857 and 120579 = 73398 deg as the angle between
the vectors [X1X2] of (1) when centered to their means For
slopes in 1198720 [9] shows that VIF
119888(120573119895) lt VIF
119906(120573119895) 1 le
119894 le 119896 This follows since their numerators are equal butdenominators are reciprocals of lengths of the centered anduncentered regressor vectors To illustrate numerators areequal for VIF
119906(1205733) = 03889(13) = 11667 and for
VIF119888(1205733) = 03889(128) = 10889 but denominators are
reciprocals of X10158402X2= 3 and sum
5
119895=1(1198832119895
minus 1198832)2= 28
(119894) 119881perp Reference Model To rectify this malapropism we seek
in Definition 8 a model R119881as Reference This is found on
setting all off-diagonal elements of Z10158400Z0to zero excluding
the first row and column Its inverse Rminus1119881
is listed in Table 6The VF
119907s are ratios of diagonal elements of (Z1015840
0Z0)minus1 on the
left to diagonal elements of Rminus1119881
on the right namely
[VF119907(1205730) VF119907(1205731) VF119907(1205732) VF119907(1205733)]
= [09697 09991 09936 09943]
(25)
Against conventional wisdom it is counter-intuitive that[1205730 1205731 1205732 1205733] are all estimated with greater precision in the
119881perp-nonorthogonal model Z1015840
0Z0than in the 119881
perp-orthogonalmodel R
119881of Definition 8 As in Section 54 this serves again
to refute the tenet that ill-conditioning espouses inflated
Advances in Decision Sciences 11
Table 5 Matrices R119862and Rminus1
119862for checking 119862
perp-orthogonality in the orthogonal data X0 under model1198720
R119862
Rminus1119862
8 10686 25538 17704 01479 minus00170 minus00444 minus0029110686 8 03411 02365 minus00170 01273 0 025538 03411 8 05652 minus00444 0 01392 017704 02365 05652 8 minus00291 0 0 01314
Table 6 Matrices (Z10158400Z0)minus1 and Rminus1
119881for checking 119881
perp-orthogonality for the linked data under model1198720
(Z10158400Z0)minus1 Rminus1
119881
01504 minus00202 minus00461 minus00283 01551 minus00261 minus00530 minus00344minus00202 01293 minus000797 00053 minus00261 01294 00089 00058minus00461 minus00079 01422 minus00054 minus00530 00089 01431 00117minus00283 00053 minus00054 01319 minus00344 00058 00117 01326
variances for models in1198720 that is that VFs necessarily equal
or exceed unity(119894119894) 119862
perp Reference Model Further consider variances werethe Linked Data to be centered As in Definition 12(ii)we seek a Reference model R
119862such that C
119908= (W minus
M) is diagonal The required R119862
and its inverse arelisted in Table 7 Since variances in the Linked Data are[015040 012926 014220 013185] from Table 6 and Refer-ence values appear as diagonal elements of Rminus1
119862on the right
in Table 7 their ratios give the centered VF119888s as
[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]
= [09920 10049 10047 10030]
(26)
We infer that changes in precision would be negligible ifinstead the Linked Data experiment was to be recast as a 119862perp-orthogonal experiment
As in Remark 15 the reference matrix R119862is constrained
by the first and second moments from X10158400X0and has free
parameters 1199091 1199092 1199093 for the (2 3) (2 4) (3 4) entries
The constraints for 119862perp-orthogonality are Rminus1
119862(2 3) =
Rminus1119862(2 4) = Rminus1
119862(3 4) = 0 Although this system of
equations can be solved with numerical software such asMaple the algorithm stated in Definition 12 easily yields thedirect solution given here
62 The Model 119872119908 For the Linked Data take 119884
119894= 12057311198851198941+
12057321198851198942
+ 12057331198851198943
+ 120598119894 1 le 119894 le 8 as Y = Z120573 + 120598
where the expected response at 1198851
= 1198852
= 1198853
= 0
is 119864(119884) = 00 and there is no occasion to translate theregressorsThe lower right (3 times 3) submatrix ofZ1015840
0Z0(4 times 4)
from 1198720is Z1015840Z its inverse is listed in Table 8 The ratios
of diagonal elements of (Z1015840Z)minus1 to reciprocals of diagonalelements of Z1015840Z are the conventional uncentered VIF
119906s
namely [10123 10247 10123] Equivalently scaling Z1015840Z tohave unit diagonals and inverting gives Cminus1 as in Table 8 itsdiagonal elements are the VIF
119906sThroughout Section 6 these
comprise the only correct interpretation of conventionalVIF119906s as genuine variance ratios
7 Case Study Body Fat Data
71 The Setting Body fat and morphogenic measurementswere reported for 119899 = 20 human subjects in Table D11 of[29 p717] Response and regressor variables are119884 amount ofbody fat X
1 triceps skinfold thickness X
2 thigh circumfer-
ence and X3 mid-arm circumference under119872
0 119884119894= 1205730+
12057311198831+12057321198832+12057331198833+ 120598119894 From the original data as reported
the condition number of X10158400X0is 1039931 and the VIF
119906s are
[2714800 4469419 630998 778703] On scaling columnsof X to equal column lengths X
119894 = 50 119894 = 1 2 3 the
resulting condition number of the revisedX10158400X0is 2 9696518
and the uncenteredVIF119906s are as before from invariance under
scalingAgainst complaints that regressors are remote from their
natural origins we center regressors to [0 0 0] as origin onsubtracting the minimal element from each column as inRemark 2 and scaling these to X
119894 = 50 119894 = 1 2 3
This is motivated on grounds that centered diagnostics apartfrom VIF
119888(1205730) are invariant under shifts and scalings from
Lemma 14 In addition under the new origin [0 0 0] 1205730
assumes prominence as the baseline response against whichchanges due to regressors are to be gaged The original dataare available on the publisherrsquos web page for [29]
Taking these shifted and scaled vectors as columnsof X = [X
1X2X3] in the revised X
0= [1
119899X]
values for X10158400X0
and its inverse are listed in Table 9The condition number of X1015840
0X0
is now 1136969 andits VIF
119906s are [6775617998742782174484] Additionally
from Theorem 4(ii) we recover the angle 1205790= 22592 deg to
quantify collinearity of regressors with the constant vectorIn addition the scaled and centered correlationmatrix for
X is
R = [
[
10 00847 08781
00847 10 01424
08781 01424 10
]
]
(27)
indicating strong linkage between mean-centered versions of(X1X3)Moreover the experimental design is neither119881perp nor
119862perp-orthogonal since neither the lower right (3 times 3) block of
X10158400X0nor the centered matrixC = (X1015840Xminus119899XX1015840) is diagonal
12 Advances in Decision Sciences
Table 7 Matrices R119862and Rminus1
119862for checking 119862
perp-orthogonality in the linked data under model1198720
R119862
Rminus1119862
8 13441 27337 17722 01516 minus00216 minus00484 minus0029113441 8 04593 02977 minus00216 01286 0 027337 04593 8 06056 minus00484 0 01415 017722 02977 06056 8 minus00291 0 0 01314
Table 8 Matrices (Z1015840Z)minus1 and Cminus1 for the linked data under model119872119908
(Z1015840Z)minus1 Cminus1
01265 minus00141 00015 10123 minus01125 00123minus00141 01281 minus00141 minus01125 10247 minus0112500015 minus00141 01265 00123 minus01125 10123
(119894) 119881perp Reference Model To compare this with a 119881
perp-or-thogonal design we turn again to Definition 8 Howeverevaluating the test criterion in Lemma 9(119894) shows that thisconfiguration is not commensurate with 119881
perp-orthogonalityso that the VF
119907s are undefined Comparisons with 119862
perp-orthogonal models are addressed next(119894119894) 119862perp Reference Model As in Definition 12(ii) the centering
matrix is
119899XX1015840 = [
[
188889 189402 187499
189402 189916 188007
187499 188007 186118
]
]
(28)
To check against 119862perp-orthogonality we obtain the matrix Wof Definition 12(ii) on replacing off-diagonal elements of thelower right (3 times 3) submatrix ofX1015840
0X0by corresponding off-
diagonal elements of 119899XX1015840 The result is the matrix R119862and
its inverse as given in Table 10 where diagonal elements ofRminus1119862
are the Reference variances The VF119888s found as ratios
of diagonal elements of (X10158400X0)minus1 relative to those of Rminus1
119862
are listed in Table 12 under G = [0 0 0] the indicator ofDefinition 12(i) for this case
72 Variance Factors and Linkage Traditional gages of ill-conditioning are patently absurd on occasion By conventionVIFs are ldquoall or nonerdquo wherein Reference models entailstrictly diagonal components In practice some regressorsare inextricably linked to get or even visualize orthogonalregressors may go beyond feasible experimental rangesrequire extrapolation beyond practical limits and challengecredulity Response functions so constrained are described in[30] as ldquopicket fencesrdquo In such cases taking C to be diagonalas its 119862perp Reference is moot at best an academic folly abettedin turn by default in standard software packages In shortconventional diagnostics here offer answers to irrelevantquestions Given pairs of regressors irrevocably linked weseek instead to assess effects on variances for other pairsthat could be unlinked by design These comments applyespecially to entries under G = [0 0 0] in Table 12
To proceed we define essential ill-conditioning as regres-sors inextricably linked necessarily to remain so and asnonessential if regressors could be unlinked by design As a
followup to Section 32 for X = 0 we infer that the constantvector is inextricably linked with regressors thus accountingfor essential ill-conditioning not removed by centeringcontrary to claims in [4] Indeed this is the essence ofDefinition 8
Fortunately these limitations may be remedied throughReference models adapted to this purpose This in turnexploits the special notation and trappings of Definition 12(i)as follows In addition toG = [0 0 0] we iterate for indicatorsG = [119892
12 11989213 11989223] taking values in [1 0 0] [0 1 0] [0 0 1]
[1 1 0] [1 0 1] [0 1 1] Values of VF119888s thus obtained are
listed in Table 12 To fix ideas the Reference R119862for G =
[1 0 0] that is for the constraint R119862(2 3) = X1015840
0X0(2 3) =
194533 is given in Table 11 together with its inverse Foundas ratios of diagonal elements of (X1015840
0X0)minus1 relative to those of
Rminus1119862
in Table 11 VF119888s are listed in Table 12 underG = [1 0 0]
In short R119862is ldquoas 119862
perp-orthogonal as it could berdquo given theconstraint Other VF
119888s in Table 12 proceed similarly For all
cases in Table 12 excluding G = [0 1 1] 1205730is estimated
with greater efficiency in the original model than any of thepostulated Reference models
As in Remark 15 the reference matrix R119862
for G =
[1 0 0] is constrained by X10158401X2
= 194533 to be retainedwhereas X1015840
1X3
= 1199091and X1015840
2X3
= 1199092are to be determined
from Rminus1119862(2 4) = Rminus1
119862(3 4) = 0 Although this system
can be solved numerically as noted the algorithm stated inDefinition 12 easily yields the direct solution given here
Recall 1198831 triceps skinfold thickness 119883
2 thigh cir-
cumference and 1198833 mid-arm circumference and from
(27) that (1198831 1198833) are strongly linked Invoking the cutoff
rule [24] for VIFs in excess of 40 it is seen for G =
[0 0 0] in Table 12 that VF119888(1205731) and VF
119888(1205733) are excessive
where [X1X2X3] are forced to be mutually uncorrelated
as Reference Remarkably similar values are reported forG isin [1 0 0] [ 0 0 1] [1 0 1] all gaged against intractableReference models wherein (119883
1 1198833) are postulated to be
uncorrelated however incrediblyOn the other hand VF
119888s for G isin [0 1 0] [1 1 0]
[0 1 1] in Table 12 are likewise comparable where (X1X3)
are allowed to remain linked at X10158401X3= 242362 Here the
VF119888s reflect negligible changes in efficiency of the estimates
Advances in Decision Sciences 13
Table 9 Matrices X10158400X0 and (X10158400X0)minus1 for the body fat data under model119872
0centered to minima and scaled to equal lengths
X10158400X0
(X10158400X0)minus1
20 194365 194893 192934 03388 minus01284 minus01482 minus00203194365 25 194533 242362 minus01284 07199 00299 minus06224194893 194533 25 196832 minus01482 00299 01711 minus00494192934 242362 196832 25 minus00203 minus06224 minus00494 06979
Table 10 Matrices R119862and Rminus1
119862for the body fat data adjusted to give off-diagonal elements the value 00 with code G = [0 0 0]
R119862
Rminus1119862
20 194365 194893 192934 05083 minus01590 minus01622 minus01510194365 25 189402 187499 minus01590 01636 0 0194893 189402 25 188007 minus01622 0 01664 0192934 187499 188007 25 minus01510 0 0 01565
in comparison with the originalX10158400X0 where [X
1X2X3] are
all linked In summary negligible changes in overall efficiencywould accrue on recasting the experiment so that (X
1X2)
and (X2X3) were pairwise uncorrelated
Table 12 conveys further useful information on notingthat VFs as ratios are multiplicative and divisible Forexample take the model codedG = [1 1 0] whereX1015840
1X2and
X10158401X3are retained at their initial values under X1015840
0X0 namely
194533 and 242362 from Table 9 with (X2X3) unlinked
Now taking G = [0 1 0] as reference we extract the VFson dividing elements in Table 12 at G = [0 1 0] by those ofG = [1 1 0] The result is
[VF (1205730) VF (120573
1) VF (120573
2) VF (120573
3)]
= [09196
0984810073
1005710282
1020810208
10208]
= [09338 10017 10072 10000]
(29)
8 Conclusions
Our goal is clarity in the actual workings of VIFs as ill-conditioning diagnostics thus to identify and rectifymiscon-ceptions regarding the use of VIFs for models X
0= [1119899X]
with intercept Would-be arbiters for ldquocorrectrdquo usage havedivided between VIF
119888s for mean-centered regressors and
VIF119906s for uncentered data We take issue with both Conven-
tional but clouded vision holds that (i) VIF119906s gage variance
inflation owing to nonorthogonal columns of X0as vectors
in R119899 (ii) VIFs ge 10 and (iii) models having orthogonalmean-centered regressors are ldquoidealrdquo in possessing VIF
119888s of
value unity Reprising Remark 3 these properties widely andcorrectly understood for models in119872
119908 appear to have been
expropriated without verification for models in1198720hence the
anomaliesAccordingly we have distinguished vector-space orthog-
onality (119881perp) of columns of X0 in contrast to orthogonality
(119862perp) of mean-centered columns of X A key feature is theconstruction of second-moment arrays as Reference models(R119881R119862) encoding orthogonalities of these types and their
Variance Factors as VF119907rsquos and VF
119888rsquos respectively Contrary
to convention we demonstrate analytically and through casestudies that (i1015840) VIF
119906s do not gage variance inflation (ii1015840)
VFs ge 10 need not hold whereas VF lt 10 representsVariance Deflation and (iii1015840) models having orthogonalcentered regressors are not necessarily ldquoidealrdquo In particularvariance deflation occurs when ill-conditioned data yieldsmaller variances than corresponding orthogonal surrogates
In short our studies call out to adjudicate the prolongeddispute regarding centered and uncentered diagnostics formodels in119872
0 Our summary conclusions follow
(S1) The choice of a model with or without intercept isa substantive matter in the context of each exper-imental paradigm That choice is fixed beforehandin a particular setting is structural and is beyondthe scope of the present study Snee and Marquardt[10] correctly assert that VIFs are fully creditedand remain undisputed in models without interceptthese quantities unambiguously retain the meaningsoriginally intended namely as ratios of variances togage effects of nonorthogonal regressors For modelswith intercept the choice is between centered anduncentered diagnostics which is to be weighed asfollows
(S2) ConventionalVIF119888s apply incompletely to slopes only
not necessarily grasping fully that a system is illconditioned Of particular concern is missing evi-dence on collinearity of regressors with the interceptOn the other hand if 119862
perp-orthogonality pertainsthen VF
119888s may supplant VIF
119888s as applicable to all of
[1205730 1205731 120573
119896] Moreover the latter may reveal that
VF119888(1205730) lt 10 as in cited examples and of special
import in predicting near the origin of the coordinatesystem
(S3) Our VF119907s capture correctly the concept apparently
intended by conventional VIF119906s Specifically if 119881perp-
orthogonality pertains then VF119907s as genuine ratios
of variances may supplant VIF119906s as now discredited
gages for variance inflation(S4) Returning to Section 32 we subscribe fully but for
different reasons to Belsleyrsquos and othersrsquo contention
14 Advances in Decision Sciences
Table 11 Matrices R119862and Rminus1
119862for the body fat data adjusted to give off-diagonal elements the value 00 except X10158401X2 code G = [1 0 0]
R119862
Rminus1119862
20 194365 194893 192934 04839 minus01465 minus01497 minus01510194365 25 194533 187499 minus01465 01648 minus00141 0194893 194533 25 188007 minus01497 minus00141 01676 0192934 187499 188007 25 minus01510 0 0 01565
Table 12 Summary VF119888s for the body fat data centered to minima and scaled to common lengths identified with varying G = [119892
12 11989213 11989223]
from Definition 12
Elements of G VF119888s
11989212
11989213
11989223
1205730
1205731
1205732
1205733
0 0 0 06665 43996 10282 44586
1 0 0 07002 43681 10208 44586
0 1 0 09196 10073 10282 10208
0 0 1 07201 43996 10073 43681
1 1 0 09848 10057 10208 10208
1 0 1 07595 43681 10003 43681
0 1 1 10248 10073 10073 10160
that uncentered VIF119906s are essential in assessing ill-
conditioned data In the geometry of ill-conditioningthese should be retained in the pantheon of regres-sion diagnostics not as now debunked gages ofvariance inflation but as more compelling angularmeasures of the collinearity of each column of X
0=
[1119899X1X2 X
119896] with the subspace spanned by
remaining columns(S5) Specifically the degree of collinearity of regressors
with the intercept is quantified by 1205790deg which
if small is known to ldquocorrupt the estimates of allparameters in the model whether or not the interceptis itself of interest and whether or not the data havebeen centeredrdquo [21 p90]This in turnwould appear toelevate 120579
0in importance as a critical ill-conditioning
diagnostic(S6) In contrast Theorem 4(iii) identifies conventional
VIF119888s with angles between a centered regressor and
the subspace spanned by remaining centered regres-sors For example
1205791= arccos([
[
1 minus1
VIFc ( ldquo1205731)]
]
12
) (30)
is the angle between the centered vector Z1= B119899X1
and the remaining centered vectors These lie in thelinear subspace of R119899 comprising the orthogonalcomplement to 1
119899isin R119899 and thus bear on the
geometry of ill-conditioning in this subspace(S7) Whereas conventional VIFs are ldquoall or nonerdquo to
be gaged against strictly diagonal components ourReference models enable unlinking what may beunlinked while leaving other pairs of regressorslinked This enables us to assess effects on variances
attributed to allowable partial dependencies amongsome but not other regressors
In conclusion the foregoing summary points to limita-tions in modern software packages Minitab v15 119878119875119878119878 v19and 119878119860119878 v92 are standard software packages which computecollinearity diagnostics returning with the default optionsthe centered VIFs VIF
119888(120573119895) 1 le 119895 le 119896 To compute
the uncentered VIF119906s one needs to define a variable for
the constant term and perform a linear regression using theoptions of fitting without an intercept On the other handcomputations for VF
119888s and VF
119907s as introduced here require
additional if straightforward programming for exampleMaple and PROC IML of the SAS System as utilized here
References
[1] D A Belsley ldquoDemeaning conditioning diagnostics throughcenteringrdquo The American Statistician vol 38 no 2 pp 73ndash771984
[2] E R Mansfield and B P Helms ldquoDetecting multicollinearityrdquoThe American Statistician vol 36 no 3 pp 158ndash160 1982
[3] S D Simon and J P Lesage ldquoThe impact of collinearityinvolving the intercept term on the numerical accuracy ofregressionrdquo Computer Science in Economics and Managementvol 1 no 2 pp 137ndash152 1988
[4] D W Marquardt and R D Snee ldquoRidge regression in practicerdquoThe American Statistician vol 29 no 1 pp 3ndash20 1975
[5] K N Berk ldquoTolerance and condition in regression computa-tionsrdquo Journal of the American Statistical Association vol 72 no360 part 1 pp 863ndash866 1977
[6] A E Beaton D Rubin and J Barone ldquoThe acceptability ofregression solutions another look at computational accuracyrdquoJournal of the American Statistical Association vol 71 no 353pp 158ndash168 1976
Advances in Decision Sciences 15
[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975
[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970
[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987
[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984
[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990
[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980
[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969
[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996
[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003
[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006
[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008
[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010
[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981
[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983
[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987
[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986
[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980
[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007
[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984
[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969
[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997
[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000
[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003
[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983
10 Advances in Decision Sciences
Table 3 Basic 119881perp-orthogonal design X0 = [18 X1 X2 X3] and a linked design Z0 = [18 Z1 Z2 Z3] of order (8 times 4)
Orthogonal (X1X2X3) Linked (Z
1Z2Z3)
18
X1
X2
X3
18
Z1
Z2
Z3
1 minus1084470 0056899 0541778 1 minus1071550 0116381 06534861 minus1090010 minus0040490 0117026 1 minus1087820 minus0027320 01160861 0979050 0002564 2337454 1 0973345 0260677 21996871 minus1090230 0016839 0170711 1 minus1081700 0035588 00706941 0127352 2821941 minus0071670 1 0438204 2796767 minus00707101 1045409 minus0122670 minus1476870 1 1025468 minus0285010 minus16185601 1090803 minus0088970 0043354 1 1074306 minus0083640 01769281 1090739 minus0092300 0108637 1 1073875 minus0079740 0244567
Table 4 Matrices X10158400X0 and (X10158400X0)minus1 for the orthogonal data under model119872
0
X10158400X0
(X10158400X0)minus1
8 10686 25538 17704 01504 minus00201 minus00480 minus0033310686 8 0 0 minus00201 01277 00064 0004425538 0 8 0 minus00480 00064 01403 0010617704 0 0 8 minus00333 00044 00106 01324
already is 119881perp-orthogonal accordingly R119881= X10158400X0 and thus
the VF119907s are all unity
In view of dissonance between 119881perp and 119862
perp orthogonalitythe sums of squares and products for the mean-centered Xare
X1015840B119899X = [
[
78572 minus03411 minus02365
minus03411 71847 minus05652
minus02365 minus05652 76082
]
]
(23)
reflecting nonnegligible negative dependencies to distin-guish the 119881
perp-orthogonal X from the 119862perp-nonorthogonal
matrix Z = B119899X of deviations
To continue we suppose instead that the 119881perp-orthogonalmodel was to be recast as 119862
perp-orthogonal The Referencemodel and its inverse are listed in Table 5 Variance factorsas ratios of diagonal elements of (X1015840
0X0)minus1 in Table 4 to those
of Rminus1119862
in Table 5 are
[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]
= [10168 10032 10082 10070]
(24)
reflecting negligible differences in precision This parallelsresults reported in Table 2 comparing 119863(0) to 119863(06) asReference except for larger differences in Table 2 namely[128681225012250]
612 Linked Data For the Linked Data under model 1198720
the matrix Z10158400Z0 follows routinely from Table 3 its inverse(Z10158400Z0)minus1 is listed in Table 6 The conventional uncentered
VIF119906s are now [120320 103408 113760 105480] These
again fail to gage variance inflation since Z10158400Z0cannot be
diagonal From (10) the principal angle between the constantvector and the span of the regressor vectors is 120579
0= 65735
degrees
Remark 15 Observe that the choice for R119862may be posed as
seeking R119862
= U(119909) such that Rminus1119862(2 3) = 00 then solving
numerically for 119909 using Maple for example However thealgorithm in Definition 12(ii) affords a direct solution thatC119908should be diagonal stipulates its off-diagonal element as
X10158401X2minus 35 = 0 so that X1015840
1X2= 06 in R
119862at expression (20)
To illustrate Theorem 4(iii) we compute cos(120579) = (1 minus
110889)12 = 02857 and 120579 = 73398 deg as the angle between
the vectors [X1X2] of (1) when centered to their means For
slopes in 1198720 [9] shows that VIF
119888(120573119895) lt VIF
119906(120573119895) 1 le
119894 le 119896 This follows since their numerators are equal butdenominators are reciprocals of lengths of the centered anduncentered regressor vectors To illustrate numerators areequal for VIF
119906(1205733) = 03889(13) = 11667 and for
VIF119888(1205733) = 03889(128) = 10889 but denominators are
reciprocals of X10158402X2= 3 and sum
5
119895=1(1198832119895
minus 1198832)2= 28
(119894) 119881perp Reference Model To rectify this malapropism we seek
in Definition 8 a model R119881as Reference This is found on
setting all off-diagonal elements of Z10158400Z0to zero excluding
the first row and column Its inverse Rminus1119881
is listed in Table 6The VF
119907s are ratios of diagonal elements of (Z1015840
0Z0)minus1 on the
left to diagonal elements of Rminus1119881
on the right namely
[VF119907(1205730) VF119907(1205731) VF119907(1205732) VF119907(1205733)]
= [09697 09991 09936 09943]
(25)
Against conventional wisdom it is counter-intuitive that[1205730 1205731 1205732 1205733] are all estimated with greater precision in the
119881perp-nonorthogonal model Z1015840
0Z0than in the 119881
perp-orthogonalmodel R
119881of Definition 8 As in Section 54 this serves again
to refute the tenet that ill-conditioning espouses inflated
Advances in Decision Sciences 11
Table 5 Matrices R119862and Rminus1
119862for checking 119862
perp-orthogonality in the orthogonal data X0 under model1198720
R119862
Rminus1119862
8 10686 25538 17704 01479 minus00170 minus00444 minus0029110686 8 03411 02365 minus00170 01273 0 025538 03411 8 05652 minus00444 0 01392 017704 02365 05652 8 minus00291 0 0 01314
Table 6 Matrices (Z10158400Z0)minus1 and Rminus1
119881for checking 119881
perp-orthogonality for the linked data under model1198720
(Z10158400Z0)minus1 Rminus1
119881
01504 minus00202 minus00461 minus00283 01551 minus00261 minus00530 minus00344minus00202 01293 minus000797 00053 minus00261 01294 00089 00058minus00461 minus00079 01422 minus00054 minus00530 00089 01431 00117minus00283 00053 minus00054 01319 minus00344 00058 00117 01326
variances for models in1198720 that is that VFs necessarily equal
or exceed unity(119894119894) 119862
perp Reference Model Further consider variances werethe Linked Data to be centered As in Definition 12(ii)we seek a Reference model R
119862such that C
119908= (W minus
M) is diagonal The required R119862
and its inverse arelisted in Table 7 Since variances in the Linked Data are[015040 012926 014220 013185] from Table 6 and Refer-ence values appear as diagonal elements of Rminus1
119862on the right
in Table 7 their ratios give the centered VF119888s as
[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]
= [09920 10049 10047 10030]
(26)
We infer that changes in precision would be negligible ifinstead the Linked Data experiment was to be recast as a 119862perp-orthogonal experiment
As in Remark 15 the reference matrix R119862is constrained
by the first and second moments from X10158400X0and has free
parameters 1199091 1199092 1199093 for the (2 3) (2 4) (3 4) entries
The constraints for 119862perp-orthogonality are Rminus1
119862(2 3) =
Rminus1119862(2 4) = Rminus1
119862(3 4) = 0 Although this system of
equations can be solved with numerical software such asMaple the algorithm stated in Definition 12 easily yields thedirect solution given here
62 The Model 119872119908 For the Linked Data take 119884
119894= 12057311198851198941+
12057321198851198942
+ 12057331198851198943
+ 120598119894 1 le 119894 le 8 as Y = Z120573 + 120598
where the expected response at 1198851
= 1198852
= 1198853
= 0
is 119864(119884) = 00 and there is no occasion to translate theregressorsThe lower right (3 times 3) submatrix ofZ1015840
0Z0(4 times 4)
from 1198720is Z1015840Z its inverse is listed in Table 8 The ratios
of diagonal elements of (Z1015840Z)minus1 to reciprocals of diagonalelements of Z1015840Z are the conventional uncentered VIF
119906s
namely [10123 10247 10123] Equivalently scaling Z1015840Z tohave unit diagonals and inverting gives Cminus1 as in Table 8 itsdiagonal elements are the VIF
119906sThroughout Section 6 these
comprise the only correct interpretation of conventionalVIF119906s as genuine variance ratios
7 Case Study Body Fat Data
71 The Setting Body fat and morphogenic measurementswere reported for 119899 = 20 human subjects in Table D11 of[29 p717] Response and regressor variables are119884 amount ofbody fat X
1 triceps skinfold thickness X
2 thigh circumfer-
ence and X3 mid-arm circumference under119872
0 119884119894= 1205730+
12057311198831+12057321198832+12057331198833+ 120598119894 From the original data as reported
the condition number of X10158400X0is 1039931 and the VIF
119906s are
[2714800 4469419 630998 778703] On scaling columnsof X to equal column lengths X
119894 = 50 119894 = 1 2 3 the
resulting condition number of the revisedX10158400X0is 2 9696518
and the uncenteredVIF119906s are as before from invariance under
scalingAgainst complaints that regressors are remote from their
natural origins we center regressors to [0 0 0] as origin onsubtracting the minimal element from each column as inRemark 2 and scaling these to X
119894 = 50 119894 = 1 2 3
This is motivated on grounds that centered diagnostics apartfrom VIF
119888(1205730) are invariant under shifts and scalings from
Lemma 14 In addition under the new origin [0 0 0] 1205730
assumes prominence as the baseline response against whichchanges due to regressors are to be gaged The original dataare available on the publisherrsquos web page for [29]
Taking these shifted and scaled vectors as columnsof X = [X
1X2X3] in the revised X
0= [1
119899X]
values for X10158400X0
and its inverse are listed in Table 9The condition number of X1015840
0X0
is now 1136969 andits VIF
119906s are [6775617998742782174484] Additionally
from Theorem 4(ii) we recover the angle 1205790= 22592 deg to
quantify collinearity of regressors with the constant vectorIn addition the scaled and centered correlationmatrix for
X is
R = [
[
10 00847 08781
00847 10 01424
08781 01424 10
]
]
(27)
indicating strong linkage between mean-centered versions of(X1X3)Moreover the experimental design is neither119881perp nor
119862perp-orthogonal since neither the lower right (3 times 3) block of
X10158400X0nor the centered matrixC = (X1015840Xminus119899XX1015840) is diagonal
12 Advances in Decision Sciences
Table 7 Matrices R119862and Rminus1
119862for checking 119862
perp-orthogonality in the linked data under model1198720
R119862
Rminus1119862
8 13441 27337 17722 01516 minus00216 minus00484 minus0029113441 8 04593 02977 minus00216 01286 0 027337 04593 8 06056 minus00484 0 01415 017722 02977 06056 8 minus00291 0 0 01314
Table 8 Matrices (Z1015840Z)minus1 and Cminus1 for the linked data under model119872119908
(Z1015840Z)minus1 Cminus1
01265 minus00141 00015 10123 minus01125 00123minus00141 01281 minus00141 minus01125 10247 minus0112500015 minus00141 01265 00123 minus01125 10123
(119894) 119881perp Reference Model To compare this with a 119881
perp-or-thogonal design we turn again to Definition 8 Howeverevaluating the test criterion in Lemma 9(119894) shows that thisconfiguration is not commensurate with 119881
perp-orthogonalityso that the VF
119907s are undefined Comparisons with 119862
perp-orthogonal models are addressed next(119894119894) 119862perp Reference Model As in Definition 12(ii) the centering
matrix is
119899XX1015840 = [
[
188889 189402 187499
189402 189916 188007
187499 188007 186118
]
]
(28)
To check against 119862perp-orthogonality we obtain the matrix Wof Definition 12(ii) on replacing off-diagonal elements of thelower right (3 times 3) submatrix ofX1015840
0X0by corresponding off-
diagonal elements of 119899XX1015840 The result is the matrix R119862and
its inverse as given in Table 10 where diagonal elements ofRminus1119862
are the Reference variances The VF119888s found as ratios
of diagonal elements of (X10158400X0)minus1 relative to those of Rminus1
119862
are listed in Table 12 under G = [0 0 0] the indicator ofDefinition 12(i) for this case
72 Variance Factors and Linkage Traditional gages of ill-conditioning are patently absurd on occasion By conventionVIFs are ldquoall or nonerdquo wherein Reference models entailstrictly diagonal components In practice some regressorsare inextricably linked to get or even visualize orthogonalregressors may go beyond feasible experimental rangesrequire extrapolation beyond practical limits and challengecredulity Response functions so constrained are described in[30] as ldquopicket fencesrdquo In such cases taking C to be diagonalas its 119862perp Reference is moot at best an academic folly abettedin turn by default in standard software packages In shortconventional diagnostics here offer answers to irrelevantquestions Given pairs of regressors irrevocably linked weseek instead to assess effects on variances for other pairsthat could be unlinked by design These comments applyespecially to entries under G = [0 0 0] in Table 12
To proceed we define essential ill-conditioning as regres-sors inextricably linked necessarily to remain so and asnonessential if regressors could be unlinked by design As a
followup to Section 32 for X = 0 we infer that the constantvector is inextricably linked with regressors thus accountingfor essential ill-conditioning not removed by centeringcontrary to claims in [4] Indeed this is the essence ofDefinition 8
Fortunately these limitations may be remedied throughReference models adapted to this purpose This in turnexploits the special notation and trappings of Definition 12(i)as follows In addition toG = [0 0 0] we iterate for indicatorsG = [119892
12 11989213 11989223] taking values in [1 0 0] [0 1 0] [0 0 1]
[1 1 0] [1 0 1] [0 1 1] Values of VF119888s thus obtained are
listed in Table 12 To fix ideas the Reference R119862for G =
[1 0 0] that is for the constraint R119862(2 3) = X1015840
0X0(2 3) =
194533 is given in Table 11 together with its inverse Foundas ratios of diagonal elements of (X1015840
0X0)minus1 relative to those of
Rminus1119862
in Table 11 VF119888s are listed in Table 12 underG = [1 0 0]
In short R119862is ldquoas 119862
perp-orthogonal as it could berdquo given theconstraint Other VF
119888s in Table 12 proceed similarly For all
cases in Table 12 excluding G = [0 1 1] 1205730is estimated
with greater efficiency in the original model than any of thepostulated Reference models
As in Remark 15 the reference matrix R119862
for G =
[1 0 0] is constrained by X10158401X2
= 194533 to be retainedwhereas X1015840
1X3
= 1199091and X1015840
2X3
= 1199092are to be determined
from Rminus1119862(2 4) = Rminus1
119862(3 4) = 0 Although this system
can be solved numerically as noted the algorithm stated inDefinition 12 easily yields the direct solution given here
Recall 1198831 triceps skinfold thickness 119883
2 thigh cir-
cumference and 1198833 mid-arm circumference and from
(27) that (1198831 1198833) are strongly linked Invoking the cutoff
rule [24] for VIFs in excess of 40 it is seen for G =
[0 0 0] in Table 12 that VF119888(1205731) and VF
119888(1205733) are excessive
where [X1X2X3] are forced to be mutually uncorrelated
as Reference Remarkably similar values are reported forG isin [1 0 0] [ 0 0 1] [1 0 1] all gaged against intractableReference models wherein (119883
1 1198833) are postulated to be
uncorrelated however incrediblyOn the other hand VF
119888s for G isin [0 1 0] [1 1 0]
[0 1 1] in Table 12 are likewise comparable where (X1X3)
are allowed to remain linked at X10158401X3= 242362 Here the
VF119888s reflect negligible changes in efficiency of the estimates
Advances in Decision Sciences 13
Table 9 Matrices X10158400X0 and (X10158400X0)minus1 for the body fat data under model119872
0centered to minima and scaled to equal lengths
X10158400X0
(X10158400X0)minus1
20 194365 194893 192934 03388 minus01284 minus01482 minus00203194365 25 194533 242362 minus01284 07199 00299 minus06224194893 194533 25 196832 minus01482 00299 01711 minus00494192934 242362 196832 25 minus00203 minus06224 minus00494 06979
Table 10 Matrices R119862and Rminus1
119862for the body fat data adjusted to give off-diagonal elements the value 00 with code G = [0 0 0]
R119862
Rminus1119862
20 194365 194893 192934 05083 minus01590 minus01622 minus01510194365 25 189402 187499 minus01590 01636 0 0194893 189402 25 188007 minus01622 0 01664 0192934 187499 188007 25 minus01510 0 0 01565
in comparison with the originalX10158400X0 where [X
1X2X3] are
all linked In summary negligible changes in overall efficiencywould accrue on recasting the experiment so that (X
1X2)
and (X2X3) were pairwise uncorrelated
Table 12 conveys further useful information on notingthat VFs as ratios are multiplicative and divisible Forexample take the model codedG = [1 1 0] whereX1015840
1X2and
X10158401X3are retained at their initial values under X1015840
0X0 namely
194533 and 242362 from Table 9 with (X2X3) unlinked
Now taking G = [0 1 0] as reference we extract the VFson dividing elements in Table 12 at G = [0 1 0] by those ofG = [1 1 0] The result is
[VF (1205730) VF (120573
1) VF (120573
2) VF (120573
3)]
= [09196
0984810073
1005710282
1020810208
10208]
= [09338 10017 10072 10000]
(29)
8 Conclusions
Our goal is clarity in the actual workings of VIFs as ill-conditioning diagnostics thus to identify and rectifymiscon-ceptions regarding the use of VIFs for models X
0= [1119899X]
with intercept Would-be arbiters for ldquocorrectrdquo usage havedivided between VIF
119888s for mean-centered regressors and
VIF119906s for uncentered data We take issue with both Conven-
tional but clouded vision holds that (i) VIF119906s gage variance
inflation owing to nonorthogonal columns of X0as vectors
in R119899 (ii) VIFs ge 10 and (iii) models having orthogonalmean-centered regressors are ldquoidealrdquo in possessing VIF
119888s of
value unity Reprising Remark 3 these properties widely andcorrectly understood for models in119872
119908 appear to have been
expropriated without verification for models in1198720hence the
anomaliesAccordingly we have distinguished vector-space orthog-
onality (119881perp) of columns of X0 in contrast to orthogonality
(119862perp) of mean-centered columns of X A key feature is theconstruction of second-moment arrays as Reference models(R119881R119862) encoding orthogonalities of these types and their
Variance Factors as VF119907rsquos and VF
119888rsquos respectively Contrary
to convention we demonstrate analytically and through casestudies that (i1015840) VIF
119906s do not gage variance inflation (ii1015840)
VFs ge 10 need not hold whereas VF lt 10 representsVariance Deflation and (iii1015840) models having orthogonalcentered regressors are not necessarily ldquoidealrdquo In particularvariance deflation occurs when ill-conditioned data yieldsmaller variances than corresponding orthogonal surrogates
In short our studies call out to adjudicate the prolongeddispute regarding centered and uncentered diagnostics formodels in119872
0 Our summary conclusions follow
(S1) The choice of a model with or without intercept isa substantive matter in the context of each exper-imental paradigm That choice is fixed beforehandin a particular setting is structural and is beyondthe scope of the present study Snee and Marquardt[10] correctly assert that VIFs are fully creditedand remain undisputed in models without interceptthese quantities unambiguously retain the meaningsoriginally intended namely as ratios of variances togage effects of nonorthogonal regressors For modelswith intercept the choice is between centered anduncentered diagnostics which is to be weighed asfollows
(S2) ConventionalVIF119888s apply incompletely to slopes only
not necessarily grasping fully that a system is illconditioned Of particular concern is missing evi-dence on collinearity of regressors with the interceptOn the other hand if 119862
perp-orthogonality pertainsthen VF
119888s may supplant VIF
119888s as applicable to all of
[1205730 1205731 120573
119896] Moreover the latter may reveal that
VF119888(1205730) lt 10 as in cited examples and of special
import in predicting near the origin of the coordinatesystem
(S3) Our VF119907s capture correctly the concept apparently
intended by conventional VIF119906s Specifically if 119881perp-
orthogonality pertains then VF119907s as genuine ratios
of variances may supplant VIF119906s as now discredited
gages for variance inflation(S4) Returning to Section 32 we subscribe fully but for
different reasons to Belsleyrsquos and othersrsquo contention
14 Advances in Decision Sciences
Table 11 Matrices R119862and Rminus1
119862for the body fat data adjusted to give off-diagonal elements the value 00 except X10158401X2 code G = [1 0 0]
R119862
Rminus1119862
20 194365 194893 192934 04839 minus01465 minus01497 minus01510194365 25 194533 187499 minus01465 01648 minus00141 0194893 194533 25 188007 minus01497 minus00141 01676 0192934 187499 188007 25 minus01510 0 0 01565
Table 12 Summary VF119888s for the body fat data centered to minima and scaled to common lengths identified with varying G = [119892
12 11989213 11989223]
from Definition 12
Elements of G VF119888s
11989212
11989213
11989223
1205730
1205731
1205732
1205733
0 0 0 06665 43996 10282 44586
1 0 0 07002 43681 10208 44586
0 1 0 09196 10073 10282 10208
0 0 1 07201 43996 10073 43681
1 1 0 09848 10057 10208 10208
1 0 1 07595 43681 10003 43681
0 1 1 10248 10073 10073 10160
that uncentered VIF119906s are essential in assessing ill-
conditioned data In the geometry of ill-conditioningthese should be retained in the pantheon of regres-sion diagnostics not as now debunked gages ofvariance inflation but as more compelling angularmeasures of the collinearity of each column of X
0=
[1119899X1X2 X
119896] with the subspace spanned by
remaining columns(S5) Specifically the degree of collinearity of regressors
with the intercept is quantified by 1205790deg which
if small is known to ldquocorrupt the estimates of allparameters in the model whether or not the interceptis itself of interest and whether or not the data havebeen centeredrdquo [21 p90]This in turnwould appear toelevate 120579
0in importance as a critical ill-conditioning
diagnostic(S6) In contrast Theorem 4(iii) identifies conventional
VIF119888s with angles between a centered regressor and
the subspace spanned by remaining centered regres-sors For example
1205791= arccos([
[
1 minus1
VIFc ( ldquo1205731)]
]
12
) (30)
is the angle between the centered vector Z1= B119899X1
and the remaining centered vectors These lie in thelinear subspace of R119899 comprising the orthogonalcomplement to 1
119899isin R119899 and thus bear on the
geometry of ill-conditioning in this subspace(S7) Whereas conventional VIFs are ldquoall or nonerdquo to
be gaged against strictly diagonal components ourReference models enable unlinking what may beunlinked while leaving other pairs of regressorslinked This enables us to assess effects on variances
attributed to allowable partial dependencies amongsome but not other regressors
In conclusion the foregoing summary points to limita-tions in modern software packages Minitab v15 119878119875119878119878 v19and 119878119860119878 v92 are standard software packages which computecollinearity diagnostics returning with the default optionsthe centered VIFs VIF
119888(120573119895) 1 le 119895 le 119896 To compute
the uncentered VIF119906s one needs to define a variable for
the constant term and perform a linear regression using theoptions of fitting without an intercept On the other handcomputations for VF
119888s and VF
119907s as introduced here require
additional if straightforward programming for exampleMaple and PROC IML of the SAS System as utilized here
References
[1] D A Belsley ldquoDemeaning conditioning diagnostics throughcenteringrdquo The American Statistician vol 38 no 2 pp 73ndash771984
[2] E R Mansfield and B P Helms ldquoDetecting multicollinearityrdquoThe American Statistician vol 36 no 3 pp 158ndash160 1982
[3] S D Simon and J P Lesage ldquoThe impact of collinearityinvolving the intercept term on the numerical accuracy ofregressionrdquo Computer Science in Economics and Managementvol 1 no 2 pp 137ndash152 1988
[4] D W Marquardt and R D Snee ldquoRidge regression in practicerdquoThe American Statistician vol 29 no 1 pp 3ndash20 1975
[5] K N Berk ldquoTolerance and condition in regression computa-tionsrdquo Journal of the American Statistical Association vol 72 no360 part 1 pp 863ndash866 1977
[6] A E Beaton D Rubin and J Barone ldquoThe acceptability ofregression solutions another look at computational accuracyrdquoJournal of the American Statistical Association vol 71 no 353pp 158ndash168 1976
Advances in Decision Sciences 15
[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975
[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970
[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987
[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984
[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990
[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980
[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969
[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996
[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003
[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006
[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008
[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010
[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981
[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983
[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987
[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986
[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980
[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007
[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984
[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969
[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997
[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000
[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003
[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983
Advances in Decision Sciences 11
Table 5 Matrices R119862and Rminus1
119862for checking 119862
perp-orthogonality in the orthogonal data X0 under model1198720
R119862
Rminus1119862
8 10686 25538 17704 01479 minus00170 minus00444 minus0029110686 8 03411 02365 minus00170 01273 0 025538 03411 8 05652 minus00444 0 01392 017704 02365 05652 8 minus00291 0 0 01314
Table 6 Matrices (Z10158400Z0)minus1 and Rminus1
119881for checking 119881
perp-orthogonality for the linked data under model1198720
(Z10158400Z0)minus1 Rminus1
119881
01504 minus00202 minus00461 minus00283 01551 minus00261 minus00530 minus00344minus00202 01293 minus000797 00053 minus00261 01294 00089 00058minus00461 minus00079 01422 minus00054 minus00530 00089 01431 00117minus00283 00053 minus00054 01319 minus00344 00058 00117 01326
variances for models in1198720 that is that VFs necessarily equal
or exceed unity(119894119894) 119862
perp Reference Model Further consider variances werethe Linked Data to be centered As in Definition 12(ii)we seek a Reference model R
119862such that C
119908= (W minus
M) is diagonal The required R119862
and its inverse arelisted in Table 7 Since variances in the Linked Data are[015040 012926 014220 013185] from Table 6 and Refer-ence values appear as diagonal elements of Rminus1
119862on the right
in Table 7 their ratios give the centered VF119888s as
[VF119888(1205730) VF119888(1205731) VF119888(1205732) VF119888(1205733)]
= [09920 10049 10047 10030]
(26)
We infer that changes in precision would be negligible ifinstead the Linked Data experiment was to be recast as a 119862perp-orthogonal experiment
As in Remark 15 the reference matrix R119862is constrained
by the first and second moments from X10158400X0and has free
parameters 1199091 1199092 1199093 for the (2 3) (2 4) (3 4) entries
The constraints for 119862perp-orthogonality are Rminus1
119862(2 3) =
Rminus1119862(2 4) = Rminus1
119862(3 4) = 0 Although this system of
equations can be solved with numerical software such asMaple the algorithm stated in Definition 12 easily yields thedirect solution given here
62 The Model 119872119908 For the Linked Data take 119884
119894= 12057311198851198941+
12057321198851198942
+ 12057331198851198943
+ 120598119894 1 le 119894 le 8 as Y = Z120573 + 120598
where the expected response at 1198851
= 1198852
= 1198853
= 0
is 119864(119884) = 00 and there is no occasion to translate theregressorsThe lower right (3 times 3) submatrix ofZ1015840
0Z0(4 times 4)
from 1198720is Z1015840Z its inverse is listed in Table 8 The ratios
of diagonal elements of (Z1015840Z)minus1 to reciprocals of diagonalelements of Z1015840Z are the conventional uncentered VIF
119906s
namely [10123 10247 10123] Equivalently scaling Z1015840Z tohave unit diagonals and inverting gives Cminus1 as in Table 8 itsdiagonal elements are the VIF
119906sThroughout Section 6 these
comprise the only correct interpretation of conventionalVIF119906s as genuine variance ratios
7 Case Study Body Fat Data
71 The Setting Body fat and morphogenic measurementswere reported for 119899 = 20 human subjects in Table D11 of[29 p717] Response and regressor variables are119884 amount ofbody fat X
1 triceps skinfold thickness X
2 thigh circumfer-
ence and X3 mid-arm circumference under119872
0 119884119894= 1205730+
12057311198831+12057321198832+12057331198833+ 120598119894 From the original data as reported
the condition number of X10158400X0is 1039931 and the VIF
119906s are
[2714800 4469419 630998 778703] On scaling columnsof X to equal column lengths X
119894 = 50 119894 = 1 2 3 the
resulting condition number of the revisedX10158400X0is 2 9696518
and the uncenteredVIF119906s are as before from invariance under
scalingAgainst complaints that regressors are remote from their
natural origins we center regressors to [0 0 0] as origin onsubtracting the minimal element from each column as inRemark 2 and scaling these to X
119894 = 50 119894 = 1 2 3
This is motivated on grounds that centered diagnostics apartfrom VIF
119888(1205730) are invariant under shifts and scalings from
Lemma 14 In addition under the new origin [0 0 0] 1205730
assumes prominence as the baseline response against whichchanges due to regressors are to be gaged The original dataare available on the publisherrsquos web page for [29]
Taking these shifted and scaled vectors as columnsof X = [X
1X2X3] in the revised X
0= [1
119899X]
values for X10158400X0
and its inverse are listed in Table 9The condition number of X1015840
0X0
is now 1136969 andits VIF
119906s are [6775617998742782174484] Additionally
from Theorem 4(ii) we recover the angle 1205790= 22592 deg to
quantify collinearity of regressors with the constant vectorIn addition the scaled and centered correlationmatrix for
X is
R = [
[
10 00847 08781
00847 10 01424
08781 01424 10
]
]
(27)
indicating strong linkage between mean-centered versions of(X1X3)Moreover the experimental design is neither119881perp nor
119862perp-orthogonal since neither the lower right (3 times 3) block of
X10158400X0nor the centered matrixC = (X1015840Xminus119899XX1015840) is diagonal
12 Advances in Decision Sciences
Table 7 Matrices R119862and Rminus1
119862for checking 119862
perp-orthogonality in the linked data under model1198720
R119862
Rminus1119862
8 13441 27337 17722 01516 minus00216 minus00484 minus0029113441 8 04593 02977 minus00216 01286 0 027337 04593 8 06056 minus00484 0 01415 017722 02977 06056 8 minus00291 0 0 01314
Table 8 Matrices (Z1015840Z)minus1 and Cminus1 for the linked data under model119872119908
(Z1015840Z)minus1 Cminus1
01265 minus00141 00015 10123 minus01125 00123minus00141 01281 minus00141 minus01125 10247 minus0112500015 minus00141 01265 00123 minus01125 10123
(119894) 119881perp Reference Model To compare this with a 119881
perp-or-thogonal design we turn again to Definition 8 Howeverevaluating the test criterion in Lemma 9(119894) shows that thisconfiguration is not commensurate with 119881
perp-orthogonalityso that the VF
119907s are undefined Comparisons with 119862
perp-orthogonal models are addressed next(119894119894) 119862perp Reference Model As in Definition 12(ii) the centering
matrix is
119899XX1015840 = [
[
188889 189402 187499
189402 189916 188007
187499 188007 186118
]
]
(28)
To check against 119862perp-orthogonality we obtain the matrix Wof Definition 12(ii) on replacing off-diagonal elements of thelower right (3 times 3) submatrix ofX1015840
0X0by corresponding off-
diagonal elements of 119899XX1015840 The result is the matrix R119862and
its inverse as given in Table 10 where diagonal elements ofRminus1119862
are the Reference variances The VF119888s found as ratios
of diagonal elements of (X10158400X0)minus1 relative to those of Rminus1
119862
are listed in Table 12 under G = [0 0 0] the indicator ofDefinition 12(i) for this case
72 Variance Factors and Linkage Traditional gages of ill-conditioning are patently absurd on occasion By conventionVIFs are ldquoall or nonerdquo wherein Reference models entailstrictly diagonal components In practice some regressorsare inextricably linked to get or even visualize orthogonalregressors may go beyond feasible experimental rangesrequire extrapolation beyond practical limits and challengecredulity Response functions so constrained are described in[30] as ldquopicket fencesrdquo In such cases taking C to be diagonalas its 119862perp Reference is moot at best an academic folly abettedin turn by default in standard software packages In shortconventional diagnostics here offer answers to irrelevantquestions Given pairs of regressors irrevocably linked weseek instead to assess effects on variances for other pairsthat could be unlinked by design These comments applyespecially to entries under G = [0 0 0] in Table 12
To proceed we define essential ill-conditioning as regres-sors inextricably linked necessarily to remain so and asnonessential if regressors could be unlinked by design As a
followup to Section 32 for X = 0 we infer that the constantvector is inextricably linked with regressors thus accountingfor essential ill-conditioning not removed by centeringcontrary to claims in [4] Indeed this is the essence ofDefinition 8
Fortunately these limitations may be remedied throughReference models adapted to this purpose This in turnexploits the special notation and trappings of Definition 12(i)as follows In addition toG = [0 0 0] we iterate for indicatorsG = [119892
12 11989213 11989223] taking values in [1 0 0] [0 1 0] [0 0 1]
[1 1 0] [1 0 1] [0 1 1] Values of VF119888s thus obtained are
listed in Table 12 To fix ideas the Reference R119862for G =
[1 0 0] that is for the constraint R119862(2 3) = X1015840
0X0(2 3) =
194533 is given in Table 11 together with its inverse Foundas ratios of diagonal elements of (X1015840
0X0)minus1 relative to those of
Rminus1119862
in Table 11 VF119888s are listed in Table 12 underG = [1 0 0]
In short R119862is ldquoas 119862
perp-orthogonal as it could berdquo given theconstraint Other VF
119888s in Table 12 proceed similarly For all
cases in Table 12 excluding G = [0 1 1] 1205730is estimated
with greater efficiency in the original model than any of thepostulated Reference models
As in Remark 15 the reference matrix R119862
for G =
[1 0 0] is constrained by X10158401X2
= 194533 to be retainedwhereas X1015840
1X3
= 1199091and X1015840
2X3
= 1199092are to be determined
from Rminus1119862(2 4) = Rminus1
119862(3 4) = 0 Although this system
can be solved numerically as noted the algorithm stated inDefinition 12 easily yields the direct solution given here
Recall 1198831 triceps skinfold thickness 119883
2 thigh cir-
cumference and 1198833 mid-arm circumference and from
(27) that (1198831 1198833) are strongly linked Invoking the cutoff
rule [24] for VIFs in excess of 40 it is seen for G =
[0 0 0] in Table 12 that VF119888(1205731) and VF
119888(1205733) are excessive
where [X1X2X3] are forced to be mutually uncorrelated
as Reference Remarkably similar values are reported forG isin [1 0 0] [ 0 0 1] [1 0 1] all gaged against intractableReference models wherein (119883
1 1198833) are postulated to be
uncorrelated however incrediblyOn the other hand VF
119888s for G isin [0 1 0] [1 1 0]
[0 1 1] in Table 12 are likewise comparable where (X1X3)
are allowed to remain linked at X10158401X3= 242362 Here the
VF119888s reflect negligible changes in efficiency of the estimates
Advances in Decision Sciences 13
Table 9 Matrices X10158400X0 and (X10158400X0)minus1 for the body fat data under model119872
0centered to minima and scaled to equal lengths
X10158400X0
(X10158400X0)minus1
20 194365 194893 192934 03388 minus01284 minus01482 minus00203194365 25 194533 242362 minus01284 07199 00299 minus06224194893 194533 25 196832 minus01482 00299 01711 minus00494192934 242362 196832 25 minus00203 minus06224 minus00494 06979
Table 10 Matrices R119862and Rminus1
119862for the body fat data adjusted to give off-diagonal elements the value 00 with code G = [0 0 0]
R119862
Rminus1119862
20 194365 194893 192934 05083 minus01590 minus01622 minus01510194365 25 189402 187499 minus01590 01636 0 0194893 189402 25 188007 minus01622 0 01664 0192934 187499 188007 25 minus01510 0 0 01565
in comparison with the originalX10158400X0 where [X
1X2X3] are
all linked In summary negligible changes in overall efficiencywould accrue on recasting the experiment so that (X
1X2)
and (X2X3) were pairwise uncorrelated
Table 12 conveys further useful information on notingthat VFs as ratios are multiplicative and divisible Forexample take the model codedG = [1 1 0] whereX1015840
1X2and
X10158401X3are retained at their initial values under X1015840
0X0 namely
194533 and 242362 from Table 9 with (X2X3) unlinked
Now taking G = [0 1 0] as reference we extract the VFson dividing elements in Table 12 at G = [0 1 0] by those ofG = [1 1 0] The result is
[VF (1205730) VF (120573
1) VF (120573
2) VF (120573
3)]
= [09196
0984810073
1005710282
1020810208
10208]
= [09338 10017 10072 10000]
(29)
8 Conclusions
Our goal is clarity in the actual workings of VIFs as ill-conditioning diagnostics thus to identify and rectifymiscon-ceptions regarding the use of VIFs for models X
0= [1119899X]
with intercept Would-be arbiters for ldquocorrectrdquo usage havedivided between VIF
119888s for mean-centered regressors and
VIF119906s for uncentered data We take issue with both Conven-
tional but clouded vision holds that (i) VIF119906s gage variance
inflation owing to nonorthogonal columns of X0as vectors
in R119899 (ii) VIFs ge 10 and (iii) models having orthogonalmean-centered regressors are ldquoidealrdquo in possessing VIF
119888s of
value unity Reprising Remark 3 these properties widely andcorrectly understood for models in119872
119908 appear to have been
expropriated without verification for models in1198720hence the
anomaliesAccordingly we have distinguished vector-space orthog-
onality (119881perp) of columns of X0 in contrast to orthogonality
(119862perp) of mean-centered columns of X A key feature is theconstruction of second-moment arrays as Reference models(R119881R119862) encoding orthogonalities of these types and their
Variance Factors as VF119907rsquos and VF
119888rsquos respectively Contrary
to convention we demonstrate analytically and through casestudies that (i1015840) VIF
119906s do not gage variance inflation (ii1015840)
VFs ge 10 need not hold whereas VF lt 10 representsVariance Deflation and (iii1015840) models having orthogonalcentered regressors are not necessarily ldquoidealrdquo In particularvariance deflation occurs when ill-conditioned data yieldsmaller variances than corresponding orthogonal surrogates
In short our studies call out to adjudicate the prolongeddispute regarding centered and uncentered diagnostics formodels in119872
0 Our summary conclusions follow
(S1) The choice of a model with or without intercept isa substantive matter in the context of each exper-imental paradigm That choice is fixed beforehandin a particular setting is structural and is beyondthe scope of the present study Snee and Marquardt[10] correctly assert that VIFs are fully creditedand remain undisputed in models without interceptthese quantities unambiguously retain the meaningsoriginally intended namely as ratios of variances togage effects of nonorthogonal regressors For modelswith intercept the choice is between centered anduncentered diagnostics which is to be weighed asfollows
(S2) ConventionalVIF119888s apply incompletely to slopes only
not necessarily grasping fully that a system is illconditioned Of particular concern is missing evi-dence on collinearity of regressors with the interceptOn the other hand if 119862
perp-orthogonality pertainsthen VF
119888s may supplant VIF
119888s as applicable to all of
[1205730 1205731 120573
119896] Moreover the latter may reveal that
VF119888(1205730) lt 10 as in cited examples and of special
import in predicting near the origin of the coordinatesystem
(S3) Our VF119907s capture correctly the concept apparently
intended by conventional VIF119906s Specifically if 119881perp-
orthogonality pertains then VF119907s as genuine ratios
of variances may supplant VIF119906s as now discredited
gages for variance inflation(S4) Returning to Section 32 we subscribe fully but for
different reasons to Belsleyrsquos and othersrsquo contention
14 Advances in Decision Sciences
Table 11 Matrices R119862and Rminus1
119862for the body fat data adjusted to give off-diagonal elements the value 00 except X10158401X2 code G = [1 0 0]
R119862
Rminus1119862
20 194365 194893 192934 04839 minus01465 minus01497 minus01510194365 25 194533 187499 minus01465 01648 minus00141 0194893 194533 25 188007 minus01497 minus00141 01676 0192934 187499 188007 25 minus01510 0 0 01565
Table 12 Summary VF119888s for the body fat data centered to minima and scaled to common lengths identified with varying G = [119892
12 11989213 11989223]
from Definition 12
Elements of G VF119888s
11989212
11989213
11989223
1205730
1205731
1205732
1205733
0 0 0 06665 43996 10282 44586
1 0 0 07002 43681 10208 44586
0 1 0 09196 10073 10282 10208
0 0 1 07201 43996 10073 43681
1 1 0 09848 10057 10208 10208
1 0 1 07595 43681 10003 43681
0 1 1 10248 10073 10073 10160
that uncentered VIF119906s are essential in assessing ill-
conditioned data In the geometry of ill-conditioningthese should be retained in the pantheon of regres-sion diagnostics not as now debunked gages ofvariance inflation but as more compelling angularmeasures of the collinearity of each column of X
0=
[1119899X1X2 X
119896] with the subspace spanned by
remaining columns(S5) Specifically the degree of collinearity of regressors
with the intercept is quantified by 1205790deg which
if small is known to ldquocorrupt the estimates of allparameters in the model whether or not the interceptis itself of interest and whether or not the data havebeen centeredrdquo [21 p90]This in turnwould appear toelevate 120579
0in importance as a critical ill-conditioning
diagnostic(S6) In contrast Theorem 4(iii) identifies conventional
VIF119888s with angles between a centered regressor and
the subspace spanned by remaining centered regres-sors For example
1205791= arccos([
[
1 minus1
VIFc ( ldquo1205731)]
]
12
) (30)
is the angle between the centered vector Z1= B119899X1
and the remaining centered vectors These lie in thelinear subspace of R119899 comprising the orthogonalcomplement to 1
119899isin R119899 and thus bear on the
geometry of ill-conditioning in this subspace(S7) Whereas conventional VIFs are ldquoall or nonerdquo to
be gaged against strictly diagonal components ourReference models enable unlinking what may beunlinked while leaving other pairs of regressorslinked This enables us to assess effects on variances
attributed to allowable partial dependencies amongsome but not other regressors
In conclusion the foregoing summary points to limita-tions in modern software packages Minitab v15 119878119875119878119878 v19and 119878119860119878 v92 are standard software packages which computecollinearity diagnostics returning with the default optionsthe centered VIFs VIF
119888(120573119895) 1 le 119895 le 119896 To compute
the uncentered VIF119906s one needs to define a variable for
the constant term and perform a linear regression using theoptions of fitting without an intercept On the other handcomputations for VF
119888s and VF
119907s as introduced here require
additional if straightforward programming for exampleMaple and PROC IML of the SAS System as utilized here
References
[1] D A Belsley ldquoDemeaning conditioning diagnostics throughcenteringrdquo The American Statistician vol 38 no 2 pp 73ndash771984
[2] E R Mansfield and B P Helms ldquoDetecting multicollinearityrdquoThe American Statistician vol 36 no 3 pp 158ndash160 1982
[3] S D Simon and J P Lesage ldquoThe impact of collinearityinvolving the intercept term on the numerical accuracy ofregressionrdquo Computer Science in Economics and Managementvol 1 no 2 pp 137ndash152 1988
[4] D W Marquardt and R D Snee ldquoRidge regression in practicerdquoThe American Statistician vol 29 no 1 pp 3ndash20 1975
[5] K N Berk ldquoTolerance and condition in regression computa-tionsrdquo Journal of the American Statistical Association vol 72 no360 part 1 pp 863ndash866 1977
[6] A E Beaton D Rubin and J Barone ldquoThe acceptability ofregression solutions another look at computational accuracyrdquoJournal of the American Statistical Association vol 71 no 353pp 158ndash168 1976
Advances in Decision Sciences 15
[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975
[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970
[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987
[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984
[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990
[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980
[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969
[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996
[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003
[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006
[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008
[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010
[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981
[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983
[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987
[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986
[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980
[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007
[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984
[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969
[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997
[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000
[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003
[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983
12 Advances in Decision Sciences
Table 7 Matrices R119862and Rminus1
119862for checking 119862
perp-orthogonality in the linked data under model1198720
R119862
Rminus1119862
8 13441 27337 17722 01516 minus00216 minus00484 minus0029113441 8 04593 02977 minus00216 01286 0 027337 04593 8 06056 minus00484 0 01415 017722 02977 06056 8 minus00291 0 0 01314
Table 8 Matrices (Z1015840Z)minus1 and Cminus1 for the linked data under model119872119908
(Z1015840Z)minus1 Cminus1
01265 minus00141 00015 10123 minus01125 00123minus00141 01281 minus00141 minus01125 10247 minus0112500015 minus00141 01265 00123 minus01125 10123
(119894) 119881perp Reference Model To compare this with a 119881
perp-or-thogonal design we turn again to Definition 8 Howeverevaluating the test criterion in Lemma 9(119894) shows that thisconfiguration is not commensurate with 119881
perp-orthogonalityso that the VF
119907s are undefined Comparisons with 119862
perp-orthogonal models are addressed next(119894119894) 119862perp Reference Model As in Definition 12(ii) the centering
matrix is
119899XX1015840 = [
[
188889 189402 187499
189402 189916 188007
187499 188007 186118
]
]
(28)
To check against 119862perp-orthogonality we obtain the matrix Wof Definition 12(ii) on replacing off-diagonal elements of thelower right (3 times 3) submatrix ofX1015840
0X0by corresponding off-
diagonal elements of 119899XX1015840 The result is the matrix R119862and
its inverse as given in Table 10 where diagonal elements ofRminus1119862
are the Reference variances The VF119888s found as ratios
of diagonal elements of (X10158400X0)minus1 relative to those of Rminus1
119862
are listed in Table 12 under G = [0 0 0] the indicator ofDefinition 12(i) for this case
72 Variance Factors and Linkage Traditional gages of ill-conditioning are patently absurd on occasion By conventionVIFs are ldquoall or nonerdquo wherein Reference models entailstrictly diagonal components In practice some regressorsare inextricably linked to get or even visualize orthogonalregressors may go beyond feasible experimental rangesrequire extrapolation beyond practical limits and challengecredulity Response functions so constrained are described in[30] as ldquopicket fencesrdquo In such cases taking C to be diagonalas its 119862perp Reference is moot at best an academic folly abettedin turn by default in standard software packages In shortconventional diagnostics here offer answers to irrelevantquestions Given pairs of regressors irrevocably linked weseek instead to assess effects on variances for other pairsthat could be unlinked by design These comments applyespecially to entries under G = [0 0 0] in Table 12
To proceed we define essential ill-conditioning as regres-sors inextricably linked necessarily to remain so and asnonessential if regressors could be unlinked by design As a
followup to Section 32 for X = 0 we infer that the constantvector is inextricably linked with regressors thus accountingfor essential ill-conditioning not removed by centeringcontrary to claims in [4] Indeed this is the essence ofDefinition 8
Fortunately these limitations may be remedied throughReference models adapted to this purpose This in turnexploits the special notation and trappings of Definition 12(i)as follows In addition toG = [0 0 0] we iterate for indicatorsG = [119892
12 11989213 11989223] taking values in [1 0 0] [0 1 0] [0 0 1]
[1 1 0] [1 0 1] [0 1 1] Values of VF119888s thus obtained are
listed in Table 12 To fix ideas the Reference R119862for G =
[1 0 0] that is for the constraint R119862(2 3) = X1015840
0X0(2 3) =
194533 is given in Table 11 together with its inverse Foundas ratios of diagonal elements of (X1015840
0X0)minus1 relative to those of
Rminus1119862
in Table 11 VF119888s are listed in Table 12 underG = [1 0 0]
In short R119862is ldquoas 119862
perp-orthogonal as it could berdquo given theconstraint Other VF
119888s in Table 12 proceed similarly For all
cases in Table 12 excluding G = [0 1 1] 1205730is estimated
with greater efficiency in the original model than any of thepostulated Reference models
As in Remark 15 the reference matrix R119862
for G =
[1 0 0] is constrained by X10158401X2
= 194533 to be retainedwhereas X1015840
1X3
= 1199091and X1015840
2X3
= 1199092are to be determined
from Rminus1119862(2 4) = Rminus1
119862(3 4) = 0 Although this system
can be solved numerically as noted the algorithm stated inDefinition 12 easily yields the direct solution given here
Recall 1198831 triceps skinfold thickness 119883
2 thigh cir-
cumference and 1198833 mid-arm circumference and from
(27) that (1198831 1198833) are strongly linked Invoking the cutoff
rule [24] for VIFs in excess of 40 it is seen for G =
[0 0 0] in Table 12 that VF119888(1205731) and VF
119888(1205733) are excessive
where [X1X2X3] are forced to be mutually uncorrelated
as Reference Remarkably similar values are reported forG isin [1 0 0] [ 0 0 1] [1 0 1] all gaged against intractableReference models wherein (119883
1 1198833) are postulated to be
uncorrelated however incrediblyOn the other hand VF
119888s for G isin [0 1 0] [1 1 0]
[0 1 1] in Table 12 are likewise comparable where (X1X3)
are allowed to remain linked at X10158401X3= 242362 Here the
VF119888s reflect negligible changes in efficiency of the estimates
Advances in Decision Sciences 13
Table 9 Matrices X10158400X0 and (X10158400X0)minus1 for the body fat data under model119872
0centered to minima and scaled to equal lengths
X10158400X0
(X10158400X0)minus1
20 194365 194893 192934 03388 minus01284 minus01482 minus00203194365 25 194533 242362 minus01284 07199 00299 minus06224194893 194533 25 196832 minus01482 00299 01711 minus00494192934 242362 196832 25 minus00203 minus06224 minus00494 06979
Table 10 Matrices R119862and Rminus1
119862for the body fat data adjusted to give off-diagonal elements the value 00 with code G = [0 0 0]
R119862
Rminus1119862
20 194365 194893 192934 05083 minus01590 minus01622 minus01510194365 25 189402 187499 minus01590 01636 0 0194893 189402 25 188007 minus01622 0 01664 0192934 187499 188007 25 minus01510 0 0 01565
in comparison with the originalX10158400X0 where [X
1X2X3] are
all linked In summary negligible changes in overall efficiencywould accrue on recasting the experiment so that (X
1X2)
and (X2X3) were pairwise uncorrelated
Table 12 conveys further useful information on notingthat VFs as ratios are multiplicative and divisible Forexample take the model codedG = [1 1 0] whereX1015840
1X2and
X10158401X3are retained at their initial values under X1015840
0X0 namely
194533 and 242362 from Table 9 with (X2X3) unlinked
Now taking G = [0 1 0] as reference we extract the VFson dividing elements in Table 12 at G = [0 1 0] by those ofG = [1 1 0] The result is
[VF (1205730) VF (120573
1) VF (120573
2) VF (120573
3)]
= [09196
0984810073
1005710282
1020810208
10208]
= [09338 10017 10072 10000]
(29)
8 Conclusions
Our goal is clarity in the actual workings of VIFs as ill-conditioning diagnostics thus to identify and rectifymiscon-ceptions regarding the use of VIFs for models X
0= [1119899X]
with intercept Would-be arbiters for ldquocorrectrdquo usage havedivided between VIF
119888s for mean-centered regressors and
VIF119906s for uncentered data We take issue with both Conven-
tional but clouded vision holds that (i) VIF119906s gage variance
inflation owing to nonorthogonal columns of X0as vectors
in R119899 (ii) VIFs ge 10 and (iii) models having orthogonalmean-centered regressors are ldquoidealrdquo in possessing VIF
119888s of
value unity Reprising Remark 3 these properties widely andcorrectly understood for models in119872
119908 appear to have been
expropriated without verification for models in1198720hence the
anomaliesAccordingly we have distinguished vector-space orthog-
onality (119881perp) of columns of X0 in contrast to orthogonality
(119862perp) of mean-centered columns of X A key feature is theconstruction of second-moment arrays as Reference models(R119881R119862) encoding orthogonalities of these types and their
Variance Factors as VF119907rsquos and VF
119888rsquos respectively Contrary
to convention we demonstrate analytically and through casestudies that (i1015840) VIF
119906s do not gage variance inflation (ii1015840)
VFs ge 10 need not hold whereas VF lt 10 representsVariance Deflation and (iii1015840) models having orthogonalcentered regressors are not necessarily ldquoidealrdquo In particularvariance deflation occurs when ill-conditioned data yieldsmaller variances than corresponding orthogonal surrogates
In short our studies call out to adjudicate the prolongeddispute regarding centered and uncentered diagnostics formodels in119872
0 Our summary conclusions follow
(S1) The choice of a model with or without intercept isa substantive matter in the context of each exper-imental paradigm That choice is fixed beforehandin a particular setting is structural and is beyondthe scope of the present study Snee and Marquardt[10] correctly assert that VIFs are fully creditedand remain undisputed in models without interceptthese quantities unambiguously retain the meaningsoriginally intended namely as ratios of variances togage effects of nonorthogonal regressors For modelswith intercept the choice is between centered anduncentered diagnostics which is to be weighed asfollows
(S2) ConventionalVIF119888s apply incompletely to slopes only
not necessarily grasping fully that a system is illconditioned Of particular concern is missing evi-dence on collinearity of regressors with the interceptOn the other hand if 119862
perp-orthogonality pertainsthen VF
119888s may supplant VIF
119888s as applicable to all of
[1205730 1205731 120573
119896] Moreover the latter may reveal that
VF119888(1205730) lt 10 as in cited examples and of special
import in predicting near the origin of the coordinatesystem
(S3) Our VF119907s capture correctly the concept apparently
intended by conventional VIF119906s Specifically if 119881perp-
orthogonality pertains then VF119907s as genuine ratios
of variances may supplant VIF119906s as now discredited
gages for variance inflation(S4) Returning to Section 32 we subscribe fully but for
different reasons to Belsleyrsquos and othersrsquo contention
14 Advances in Decision Sciences
Table 11 Matrices R119862and Rminus1
119862for the body fat data adjusted to give off-diagonal elements the value 00 except X10158401X2 code G = [1 0 0]
R119862
Rminus1119862
20 194365 194893 192934 04839 minus01465 minus01497 minus01510194365 25 194533 187499 minus01465 01648 minus00141 0194893 194533 25 188007 minus01497 minus00141 01676 0192934 187499 188007 25 minus01510 0 0 01565
Table 12 Summary VF119888s for the body fat data centered to minima and scaled to common lengths identified with varying G = [119892
12 11989213 11989223]
from Definition 12
Elements of G VF119888s
11989212
11989213
11989223
1205730
1205731
1205732
1205733
0 0 0 06665 43996 10282 44586
1 0 0 07002 43681 10208 44586
0 1 0 09196 10073 10282 10208
0 0 1 07201 43996 10073 43681
1 1 0 09848 10057 10208 10208
1 0 1 07595 43681 10003 43681
0 1 1 10248 10073 10073 10160
that uncentered VIF119906s are essential in assessing ill-
conditioned data In the geometry of ill-conditioningthese should be retained in the pantheon of regres-sion diagnostics not as now debunked gages ofvariance inflation but as more compelling angularmeasures of the collinearity of each column of X
0=
[1119899X1X2 X
119896] with the subspace spanned by
remaining columns(S5) Specifically the degree of collinearity of regressors
with the intercept is quantified by 1205790deg which
if small is known to ldquocorrupt the estimates of allparameters in the model whether or not the interceptis itself of interest and whether or not the data havebeen centeredrdquo [21 p90]This in turnwould appear toelevate 120579
0in importance as a critical ill-conditioning
diagnostic(S6) In contrast Theorem 4(iii) identifies conventional
VIF119888s with angles between a centered regressor and
the subspace spanned by remaining centered regres-sors For example
1205791= arccos([
[
1 minus1
VIFc ( ldquo1205731)]
]
12
) (30)
is the angle between the centered vector Z1= B119899X1
and the remaining centered vectors These lie in thelinear subspace of R119899 comprising the orthogonalcomplement to 1
119899isin R119899 and thus bear on the
geometry of ill-conditioning in this subspace(S7) Whereas conventional VIFs are ldquoall or nonerdquo to
be gaged against strictly diagonal components ourReference models enable unlinking what may beunlinked while leaving other pairs of regressorslinked This enables us to assess effects on variances
attributed to allowable partial dependencies amongsome but not other regressors
In conclusion the foregoing summary points to limita-tions in modern software packages Minitab v15 119878119875119878119878 v19and 119878119860119878 v92 are standard software packages which computecollinearity diagnostics returning with the default optionsthe centered VIFs VIF
119888(120573119895) 1 le 119895 le 119896 To compute
the uncentered VIF119906s one needs to define a variable for
the constant term and perform a linear regression using theoptions of fitting without an intercept On the other handcomputations for VF
119888s and VF
119907s as introduced here require
additional if straightforward programming for exampleMaple and PROC IML of the SAS System as utilized here
References
[1] D A Belsley ldquoDemeaning conditioning diagnostics throughcenteringrdquo The American Statistician vol 38 no 2 pp 73ndash771984
[2] E R Mansfield and B P Helms ldquoDetecting multicollinearityrdquoThe American Statistician vol 36 no 3 pp 158ndash160 1982
[3] S D Simon and J P Lesage ldquoThe impact of collinearityinvolving the intercept term on the numerical accuracy ofregressionrdquo Computer Science in Economics and Managementvol 1 no 2 pp 137ndash152 1988
[4] D W Marquardt and R D Snee ldquoRidge regression in practicerdquoThe American Statistician vol 29 no 1 pp 3ndash20 1975
[5] K N Berk ldquoTolerance and condition in regression computa-tionsrdquo Journal of the American Statistical Association vol 72 no360 part 1 pp 863ndash866 1977
[6] A E Beaton D Rubin and J Barone ldquoThe acceptability ofregression solutions another look at computational accuracyrdquoJournal of the American Statistical Association vol 71 no 353pp 158ndash168 1976
Advances in Decision Sciences 15
[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975
[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970
[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987
[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984
[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990
[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980
[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969
[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996
[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003
[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006
[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008
[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010
[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981
[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983
[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987
[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986
[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980
[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007
[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984
[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969
[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997
[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000
[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003
[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983
Advances in Decision Sciences 13
Table 9 Matrices X10158400X0 and (X10158400X0)minus1 for the body fat data under model119872
0centered to minima and scaled to equal lengths
X10158400X0
(X10158400X0)minus1
20 194365 194893 192934 03388 minus01284 minus01482 minus00203194365 25 194533 242362 minus01284 07199 00299 minus06224194893 194533 25 196832 minus01482 00299 01711 minus00494192934 242362 196832 25 minus00203 minus06224 minus00494 06979
Table 10 Matrices R119862and Rminus1
119862for the body fat data adjusted to give off-diagonal elements the value 00 with code G = [0 0 0]
R119862
Rminus1119862
20 194365 194893 192934 05083 minus01590 minus01622 minus01510194365 25 189402 187499 minus01590 01636 0 0194893 189402 25 188007 minus01622 0 01664 0192934 187499 188007 25 minus01510 0 0 01565
in comparison with the originalX10158400X0 where [X
1X2X3] are
all linked In summary negligible changes in overall efficiencywould accrue on recasting the experiment so that (X
1X2)
and (X2X3) were pairwise uncorrelated
Table 12 conveys further useful information on notingthat VFs as ratios are multiplicative and divisible Forexample take the model codedG = [1 1 0] whereX1015840
1X2and
X10158401X3are retained at their initial values under X1015840
0X0 namely
194533 and 242362 from Table 9 with (X2X3) unlinked
Now taking G = [0 1 0] as reference we extract the VFson dividing elements in Table 12 at G = [0 1 0] by those ofG = [1 1 0] The result is
[VF (1205730) VF (120573
1) VF (120573
2) VF (120573
3)]
= [09196
0984810073
1005710282
1020810208
10208]
= [09338 10017 10072 10000]
(29)
8 Conclusions
Our goal is clarity in the actual workings of VIFs as ill-conditioning diagnostics thus to identify and rectifymiscon-ceptions regarding the use of VIFs for models X
0= [1119899X]
with intercept Would-be arbiters for ldquocorrectrdquo usage havedivided between VIF
119888s for mean-centered regressors and
VIF119906s for uncentered data We take issue with both Conven-
tional but clouded vision holds that (i) VIF119906s gage variance
inflation owing to nonorthogonal columns of X0as vectors
in R119899 (ii) VIFs ge 10 and (iii) models having orthogonalmean-centered regressors are ldquoidealrdquo in possessing VIF
119888s of
value unity Reprising Remark 3 these properties widely andcorrectly understood for models in119872
119908 appear to have been
expropriated without verification for models in1198720hence the
anomaliesAccordingly we have distinguished vector-space orthog-
onality (119881perp) of columns of X0 in contrast to orthogonality
(119862perp) of mean-centered columns of X A key feature is theconstruction of second-moment arrays as Reference models(R119881R119862) encoding orthogonalities of these types and their
Variance Factors as VF119907rsquos and VF
119888rsquos respectively Contrary
to convention we demonstrate analytically and through casestudies that (i1015840) VIF
119906s do not gage variance inflation (ii1015840)
VFs ge 10 need not hold whereas VF lt 10 representsVariance Deflation and (iii1015840) models having orthogonalcentered regressors are not necessarily ldquoidealrdquo In particularvariance deflation occurs when ill-conditioned data yieldsmaller variances than corresponding orthogonal surrogates
In short our studies call out to adjudicate the prolongeddispute regarding centered and uncentered diagnostics formodels in119872
0 Our summary conclusions follow
(S1) The choice of a model with or without intercept isa substantive matter in the context of each exper-imental paradigm That choice is fixed beforehandin a particular setting is structural and is beyondthe scope of the present study Snee and Marquardt[10] correctly assert that VIFs are fully creditedand remain undisputed in models without interceptthese quantities unambiguously retain the meaningsoriginally intended namely as ratios of variances togage effects of nonorthogonal regressors For modelswith intercept the choice is between centered anduncentered diagnostics which is to be weighed asfollows
(S2) ConventionalVIF119888s apply incompletely to slopes only
not necessarily grasping fully that a system is illconditioned Of particular concern is missing evi-dence on collinearity of regressors with the interceptOn the other hand if 119862
perp-orthogonality pertainsthen VF
119888s may supplant VIF
119888s as applicable to all of
[1205730 1205731 120573
119896] Moreover the latter may reveal that
VF119888(1205730) lt 10 as in cited examples and of special
import in predicting near the origin of the coordinatesystem
(S3) Our VF119907s capture correctly the concept apparently
intended by conventional VIF119906s Specifically if 119881perp-
orthogonality pertains then VF119907s as genuine ratios
of variances may supplant VIF119906s as now discredited
gages for variance inflation(S4) Returning to Section 32 we subscribe fully but for
different reasons to Belsleyrsquos and othersrsquo contention
14 Advances in Decision Sciences
Table 11 Matrices R119862and Rminus1
119862for the body fat data adjusted to give off-diagonal elements the value 00 except X10158401X2 code G = [1 0 0]
R119862
Rminus1119862
20 194365 194893 192934 04839 minus01465 minus01497 minus01510194365 25 194533 187499 minus01465 01648 minus00141 0194893 194533 25 188007 minus01497 minus00141 01676 0192934 187499 188007 25 minus01510 0 0 01565
Table 12 Summary VF119888s for the body fat data centered to minima and scaled to common lengths identified with varying G = [119892
12 11989213 11989223]
from Definition 12
Elements of G VF119888s
11989212
11989213
11989223
1205730
1205731
1205732
1205733
0 0 0 06665 43996 10282 44586
1 0 0 07002 43681 10208 44586
0 1 0 09196 10073 10282 10208
0 0 1 07201 43996 10073 43681
1 1 0 09848 10057 10208 10208
1 0 1 07595 43681 10003 43681
0 1 1 10248 10073 10073 10160
that uncentered VIF119906s are essential in assessing ill-
conditioned data In the geometry of ill-conditioningthese should be retained in the pantheon of regres-sion diagnostics not as now debunked gages ofvariance inflation but as more compelling angularmeasures of the collinearity of each column of X
0=
[1119899X1X2 X
119896] with the subspace spanned by
remaining columns(S5) Specifically the degree of collinearity of regressors
with the intercept is quantified by 1205790deg which
if small is known to ldquocorrupt the estimates of allparameters in the model whether or not the interceptis itself of interest and whether or not the data havebeen centeredrdquo [21 p90]This in turnwould appear toelevate 120579
0in importance as a critical ill-conditioning
diagnostic(S6) In contrast Theorem 4(iii) identifies conventional
VIF119888s with angles between a centered regressor and
the subspace spanned by remaining centered regres-sors For example
1205791= arccos([
[
1 minus1
VIFc ( ldquo1205731)]
]
12
) (30)
is the angle between the centered vector Z1= B119899X1
and the remaining centered vectors These lie in thelinear subspace of R119899 comprising the orthogonalcomplement to 1
119899isin R119899 and thus bear on the
geometry of ill-conditioning in this subspace(S7) Whereas conventional VIFs are ldquoall or nonerdquo to
be gaged against strictly diagonal components ourReference models enable unlinking what may beunlinked while leaving other pairs of regressorslinked This enables us to assess effects on variances
attributed to allowable partial dependencies amongsome but not other regressors
In conclusion the foregoing summary points to limita-tions in modern software packages Minitab v15 119878119875119878119878 v19and 119878119860119878 v92 are standard software packages which computecollinearity diagnostics returning with the default optionsthe centered VIFs VIF
119888(120573119895) 1 le 119895 le 119896 To compute
the uncentered VIF119906s one needs to define a variable for
the constant term and perform a linear regression using theoptions of fitting without an intercept On the other handcomputations for VF
119888s and VF
119907s as introduced here require
additional if straightforward programming for exampleMaple and PROC IML of the SAS System as utilized here
References
[1] D A Belsley ldquoDemeaning conditioning diagnostics throughcenteringrdquo The American Statistician vol 38 no 2 pp 73ndash771984
[2] E R Mansfield and B P Helms ldquoDetecting multicollinearityrdquoThe American Statistician vol 36 no 3 pp 158ndash160 1982
[3] S D Simon and J P Lesage ldquoThe impact of collinearityinvolving the intercept term on the numerical accuracy ofregressionrdquo Computer Science in Economics and Managementvol 1 no 2 pp 137ndash152 1988
[4] D W Marquardt and R D Snee ldquoRidge regression in practicerdquoThe American Statistician vol 29 no 1 pp 3ndash20 1975
[5] K N Berk ldquoTolerance and condition in regression computa-tionsrdquo Journal of the American Statistical Association vol 72 no360 part 1 pp 863ndash866 1977
[6] A E Beaton D Rubin and J Barone ldquoThe acceptability ofregression solutions another look at computational accuracyrdquoJournal of the American Statistical Association vol 71 no 353pp 158ndash168 1976
Advances in Decision Sciences 15
[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975
[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970
[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987
[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984
[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990
[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980
[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969
[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996
[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003
[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006
[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008
[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010
[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981
[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983
[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987
[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986
[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980
[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007
[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984
[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969
[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997
[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000
[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003
[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983
14 Advances in Decision Sciences
Table 11 Matrices R119862and Rminus1
119862for the body fat data adjusted to give off-diagonal elements the value 00 except X10158401X2 code G = [1 0 0]
R119862
Rminus1119862
20 194365 194893 192934 04839 minus01465 minus01497 minus01510194365 25 194533 187499 minus01465 01648 minus00141 0194893 194533 25 188007 minus01497 minus00141 01676 0192934 187499 188007 25 minus01510 0 0 01565
Table 12 Summary VF119888s for the body fat data centered to minima and scaled to common lengths identified with varying G = [119892
12 11989213 11989223]
from Definition 12
Elements of G VF119888s
11989212
11989213
11989223
1205730
1205731
1205732
1205733
0 0 0 06665 43996 10282 44586
1 0 0 07002 43681 10208 44586
0 1 0 09196 10073 10282 10208
0 0 1 07201 43996 10073 43681
1 1 0 09848 10057 10208 10208
1 0 1 07595 43681 10003 43681
0 1 1 10248 10073 10073 10160
that uncentered VIF119906s are essential in assessing ill-
conditioned data In the geometry of ill-conditioningthese should be retained in the pantheon of regres-sion diagnostics not as now debunked gages ofvariance inflation but as more compelling angularmeasures of the collinearity of each column of X
0=
[1119899X1X2 X
119896] with the subspace spanned by
remaining columns(S5) Specifically the degree of collinearity of regressors
with the intercept is quantified by 1205790deg which
if small is known to ldquocorrupt the estimates of allparameters in the model whether or not the interceptis itself of interest and whether or not the data havebeen centeredrdquo [21 p90]This in turnwould appear toelevate 120579
0in importance as a critical ill-conditioning
diagnostic(S6) In contrast Theorem 4(iii) identifies conventional
VIF119888s with angles between a centered regressor and
the subspace spanned by remaining centered regres-sors For example
1205791= arccos([
[
1 minus1
VIFc ( ldquo1205731)]
]
12
) (30)
is the angle between the centered vector Z1= B119899X1
and the remaining centered vectors These lie in thelinear subspace of R119899 comprising the orthogonalcomplement to 1
119899isin R119899 and thus bear on the
geometry of ill-conditioning in this subspace(S7) Whereas conventional VIFs are ldquoall or nonerdquo to
be gaged against strictly diagonal components ourReference models enable unlinking what may beunlinked while leaving other pairs of regressorslinked This enables us to assess effects on variances
attributed to allowable partial dependencies amongsome but not other regressors
In conclusion the foregoing summary points to limita-tions in modern software packages Minitab v15 119878119875119878119878 v19and 119878119860119878 v92 are standard software packages which computecollinearity diagnostics returning with the default optionsthe centered VIFs VIF
119888(120573119895) 1 le 119895 le 119896 To compute
the uncentered VIF119906s one needs to define a variable for
the constant term and perform a linear regression using theoptions of fitting without an intercept On the other handcomputations for VF
119888s and VF
119907s as introduced here require
additional if straightforward programming for exampleMaple and PROC IML of the SAS System as utilized here
References
[1] D A Belsley ldquoDemeaning conditioning diagnostics throughcenteringrdquo The American Statistician vol 38 no 2 pp 73ndash771984
[2] E R Mansfield and B P Helms ldquoDetecting multicollinearityrdquoThe American Statistician vol 36 no 3 pp 158ndash160 1982
[3] S D Simon and J P Lesage ldquoThe impact of collinearityinvolving the intercept term on the numerical accuracy ofregressionrdquo Computer Science in Economics and Managementvol 1 no 2 pp 137ndash152 1988
[4] D W Marquardt and R D Snee ldquoRidge regression in practicerdquoThe American Statistician vol 29 no 1 pp 3ndash20 1975
[5] K N Berk ldquoTolerance and condition in regression computa-tionsrdquo Journal of the American Statistical Association vol 72 no360 part 1 pp 863ndash866 1977
[6] A E Beaton D Rubin and J Barone ldquoThe acceptability ofregression solutions another look at computational accuracyrdquoJournal of the American Statistical Association vol 71 no 353pp 158ndash168 1976
Advances in Decision Sciences 15
[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975
[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970
[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987
[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984
[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990
[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980
[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969
[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996
[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003
[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006
[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008
[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010
[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981
[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983
[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987
[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986
[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980
[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007
[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984
[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969
[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997
[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000
[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003
[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983
Advances in Decision Sciences 15
[7] R B Davies and B Hutton ldquoThe effect of errors in theindependent variables in linear regressionrdquo Biometrika vol 62no 2 pp 383ndash391 1975
[8] D W Marquardt ldquoGeneralized inverses ridge regressionbiased linear estimation and nonlinear estimationrdquo Technomet-rics vol 12 no 3 pp 591ndash612 1970
[9] G W Stewart ldquoCollinearity and least squares regressionrdquoStatistical Science vol 2 no 1 pp 68ndash84 1987
[10] R D Snee and D W Marquardt ldquoComment collinearitydiagnostics depend on the domain of prediction themodel andthe datardquo The American Statistician vol 38 no 2 pp 83ndash871984
[11] R HMyers Classical andModern Regression with ApplicationsPWS-Kent Boston Mass USA 2nd edition 1990
[12] D A Belsley E Kuh and R E Welsch Regression DiagnosticsIdentifying Influential Data and Sources of Collinearity JohnWiley amp Sons New York NY USA 1980
[13] A van der Sluis ldquoCondition numbers and equilibration ofmatricesrdquo Numerische Mathematik vol 14 pp 14ndash23 1969
[14] G C S Wang ldquoHow to handle multicollinearity in regressionmodelingrdquo Journal of Business Forecasting Methods amp Systemvol 15 no 1 pp 23ndash27 1996
[15] G C S Wang and C L Jain Regression Analysis Modeling ampForecasting Graceway Flushing NY USA 2003
[16] N Adnan M H Ahmad and R Adnan ldquoA comparative studyon some methods for handling multicollinearity problemsrdquoMatematika UTM vol 22 pp 109ndash119 2006
[17] D R Jensen and D E Ramirez ldquoAnomalies in the foundationsof ridge regressionrdquo International Statistical Review vol 76 pp89ndash105 2008
[18] D R Jensen and D E Ramirez ldquoSurrogate models in ill-conditioned systemsrdquo Journal of Statistical Planning and Infer-ence vol 140 no 7 pp 2069ndash2077 2010
[19] P F Velleman and R E Welsch ldquoEfficient computing ofregression diagnosticsrdquo The American Statistician vol 35 no4 pp 234ndash242 1981
[20] R F Gunst ldquoRegression analysis with multicollinear predictorvariables definition detection and effectsrdquo Communications inStatistics A vol 12 no 19 pp 2217ndash2260 1983
[21] G W Stewart ldquoComment well-conditioned collinearityindicesrdquo Statistical Science vol 2 no 1 pp 86ndash91 1987
[22] D A Belsley ldquoCentering the constant first-differencing andassessing conditioningrdquo in Model Reliability D A Belsley andE Kuh Eds pp 117ndash153 MIT Press Cambridge Mass USA1986
[23] G Smith and F Campbell ldquoA critique of some ridge regressionmethodsrdquo Journal of the American Statistical Association vol 75no 369 pp 74ndash103 1980
[24] R M Orsquobrien ldquoA caution regarding rules of thumb for varianceinflation factorsrdquoQualityampQuantity vol 41 no 5 pp 673ndash6902007
[25] R F Gunst ldquoComment toward a balanced assessment ofcollinearity diagnosticsrdquo The American Statistician vol 38 pp79ndash82 1984
[26] F A Graybill Introduction to Matrices with Applications inStatistics Wadsworth Belmont Calif USA 1969
[27] D Sengupta and P Bhimasankaram ldquoOn the roles of observa-tions in collinearity in the linearmodelrdquo Journal of the AmericanStatistical Association vol 92 no 439 pp 1024ndash1032 1997
[28] J O van Vuuren and W J Conradie ldquoDiagnostics and plotsfor identifying collinearity-influential observations and for clas-sifying multivariate outliersrdquo South African Statistical Journalvol 34 no 2 pp 135ndash161 2000
[29] R R HockingMethods and Applications of LinearModels JohnWiley amp Sons Hoboken NJ USA 2nd edition 2003
[30] R R Hocking and O J Pendleton ldquoThe regression dilemmardquoCommunications in Statistics vol 12 pp 497ndash527 1983