some new aspects of taxicab correspondence analysis

Post on 24-Jan-2017

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Stat Methods ApplDOI 10.1007/s10260-014-0259-6

Some new aspects of taxicab correspondence analysis

Vartan Choulakian · Biagio Simonetti ·Thu Pham Gia

Accepted: 20 February 2014© Springer-Verlag Berlin Heidelberg 2014

Abstract Correspondence analysis (CA) and nonsymmetric correspondence analysisare based on generalized singular value decomposition, and, in general, they are notequivalent. Taxicab correspondence analysis (TCA) is a L1 variant of CA, and it isbased on the generalized taxicab singular value decomposition (GTSVD). Our aim isto study the taxicab variant of nonsymmetric correspondence analysis. We find that fordiagonal metric matrices GTSVDs of a given data set are equivalent; from which wededuce the equivalence of TCA and taxicab nonsymmetric correspondence analysis.We also attempt to show that TCA stays as close as possible to the original correspon-dence matrix without calculating a dissimilarity (or similarity) measure between rowsor columns. Further, we discuss some new geometric and distance aspects of TCA.

Keywords Taxicab correspondence analysis · Taxicab nonsymmetriccorrespondence analysis · Generalized taxicab singular value decomposition

1 Introduction

Correspondence analysis (CA) and nonsymmetric correspondence analysis (NSCA)are based on generalized singular value decomposition (GSVD) of a contingencytable, but in general they are not equivalent. These methods are used for visualizingthe association in contingency tables between the row and column variables: CA isrelated to the Chi-squared statistic, while NSCA is related to the Goodman–Kruskaltau index. Taxicab correspondence analysis (TCA) is a L1 variant of CA introduced

V. Choulakian (B) · T. Pham GiaUniversité de Moncton, Moncton, Canadae-mail: vartan.choulakian@umoncton.ca

B. SimonettiUniversity of Sannio, Benevento, Italy

123

V. Choulakian et al.

by Choulakian (2006a). In this paper we address the problem of developing a taxicabvariant of nonsymmetric correspondence analysis (TNSCA) and two other, recentlyproposed, variants of correspondence analysis. To do this, we formally introduce gen-eralized taxicab singular value decomposition (GTSVD), such that TCA becomes aparticular case of it. We find the following interesting property of GTSVD: For diag-onal metric matrices all GTSVD of a given data set are equivalent; and as a corollary,we get the result that both TCA and TNSCA are equivalent.

The mathematical theory of CA, as developed by Benzécri (1973, 1992), is veryelegant and beautiful. However, by comparing CA and TCA as particular derived casesof GSVD and GTSVD respectively, we notice an interesting property of the taxicabapproach: it stays as close as possible to the original correspondence matrix withoutcalculating a dissimilarity (or similarity) measure between rows or columns. Further,we show some new geometric and distance aspects of TCA and TNSCA.

This paper is organized as follows: In Sect. 2, we present the main general resultsconcerning GSVD and GTSVD of matrices; Sect. 3 presents the derivations of fewCA methods two of them mentioned above; Sect. 4 discusses some new geometricand distance aspects of TCA. And we conclude in Sect. 5.

The theory of CA can be found, among others, in Benzécri (1973, 1992), Greenacre(1984), Gifi (1990), Nishisato (1994), Le Roux and Rouanet (2004) and Murtagh(2005); the theory of NSCA can be found, among others, in Lauro and D’Ambra(1984) and Balbi (1998).

2 Main general results

2.1 Introduction

In a series of papers Choulakian (2003, 2005, 2006a,b) developed SVD on normedfinite dimensional spaces (Banach spaces). To embed the ordinary SVD into a largerclass, we need some notation and basic results from functional analysis, see for instanceKreyszig (1978). The Lp vector norm of a vector v = (v1, . . . , vm)′ is defined to be||v||p = (

∑mi=1 |vi |p)1/p for p ≥ 1 and ||v||∞ = maxi |vi |. The two numbers

(p, q) are conjugate pairs, if p ≥ 1 and q ≥ 1 and 1/p + 1/q = 1; l Iq = (RI , ||.||q)

represents the finite I -dimensional Banach space equipped with the norm ||.||q . Amatrix X of size I×J is considered a linear and bounded operator from l J

q to l Ip. The

transpose of X, X′, is an operator from the dual of l Jq = (l J

q )∗ = l Jq1

to the dual ofl I

p = (l Ip)

∗ = l Ip1

, where p and p1 (respectively, q and q1) are conjugate.We consider the following maximization problem

max ||Xu||p subject to ||u||q = 1 for p, q ≥ 1. (1)

Now, the dual formulation of (1) is

max∣∣∣∣X′v

∣∣∣∣q1

subject to ||v||p1 = 1, (2)

123

Some new aspects of taxicab correspondence analysis

where (p, p1) and (q, q1) are conjugate pairs. The optimisation problems (1) and (2)can be reexpressed as matrix norms or operator norms

λ = maxu∈R

J

||Xu||p

||u||q ,

= maxv∈R

I

∣∣∣∣X′v

∣∣∣∣q1

||v||p1

,

= maxu∈R

J ,v∈RI

v′Xu||u||q ||v||p1

. (3)

The equations in (3) follow from a well known theorem in functional analysis on thenorm of adjoint operator, see for instance, Kreyszig (1978, pp. 232).

The geometric interpretation of (1), is that u ∈ l Jq is a normed axis on which the rows

of X are projected, and a = Xu is the projected point in l Ip. Likewise, the geometric

interpretation of (2) is that v ∈ l Ip1

is a normed axis on which the columns of X areprojected, and b = X′v is the projected point in l J

q1. So, the interpretation of (1, 2) is

in the spirit of principal components analysis, and it is valid for the rest of the paper.Two matrix norms which are central in this paper are the SVD matrix norm defined

for (p, q) = (2, 2) and TSVD matrix norm defined for (p, q) = (1,∞). We presentboth SVD and TSVD of a matrix X as similar stepwise matrix decomposition methods,refer to Choulakian (2008b) for more details, where an ordered sequence of quintuplets(aα, bα, vα, uα, λα) ∈ R

I × RJ × R

I × RJ × R for α = 1, . . . , k are computed via

(1 or 2 or 3) such that

X =k∑

α=1

aαb′α/λα, (4)

where k = rank(X) and λα for α = 1, . . . , k are in decreasing order.

• In the case of SVD of X we have the following well-known relations, see “singularvalue decomposition” in Appendix,

vα = aα/λα and uα = bα/λα, (5)

||aα||2 = a′αvα = ||bα||2 = b′

αuα = λα for α = 1, . . . , k, (6)

a′αvβ = b′

αuβ = 0 for α �= β. (7)

• In the case of TSVD of X we have the following relations, see “Taxicab singularvalue decomposition” in Appendix,

vα = sgn(aα) and uα = sgn(bα), (8)

||aα||1 = a′αvα = ||bα||1 = b′

αuα = λα for α = 1, . . . , k, (9)

a′αvβ = b′

αuβ = 0 for α > β, (10)

where sgn(bα) = (sgn(bα(1)), . . . , sgn(bα(J ))′, and sgn(bα( j)) = 1 if bα( j) >

0, sgn(bα( j)) = −1 otherwise.

123

V. Choulakian et al.

Let �I and �J be positive definite metric matrices of size I × I and J × J ,respectively. This means that the diagonal elements of �I and �J are strictly posi-tive. GSVD and GTSVD of a matrix Y with respect to the metric matrices �I and�J are matrix decomposition methods where an ordered sequence of quintuplets(fα, gα, tα, sα, λα) ∈ R

I × RJ × R

I × RJ × R for α = 1, . . . , k are computed for

k = rank(Y) and λα for α = 1, . . . , k are in decreasing order. For further detailson GSVD, see for instance Greenacre (1984). In the next subsections we show howGSVD is obtained from SVD and GTSVD is obtained from TSVD.

2.2 Generalized singular value decomposition

GSVD of a matrix Y with respect to the metric matrices �I and �J , is the repre-sentation (14) subject to (15, 16, 17); it is calculated via SVD of (4) in the followingway:

Step 1: Let X = �1/2I Y�

1/2J ;

Step 2: Compute SVD of X, that is:

X = �1/2I Y�

1/2J ,

=k∑

α=1

aαb′α/λα; (11)

Step 3: Define the following transformations:

aα = �1/2I fα and bα = �

1/2J gα; (12)

vα = �1/2I tα and uα = �

1/2J sα; (13)

Step 4: Substitute (12) in (11) and get:

Y =k∑

α=1

fαg′α/λα; (14)

Step 5: Substitute (12, 13) in (5, 6, 7) and get:

tα = fα/λα and sα = gα/λα, (15)

||fα||2,�I= f ′

α�I tα = ||gα||2,�J= g′

α�J sα = λα for α = 1, . . . , k, (16)

f ′α�I tβ = g′

α�J sβ = 0 for α �= β. (17)

Equation (16) says that the �I weighted L2 norm of fα is λα; likewise, Eq. (17) saysthat fα is �I orthogonal to tβ for α �= β. It is important to note that tβ is in the dualspace of fβ . Note that when �I and �J are identity matrices, then GSV D = SV D.

123

Some new aspects of taxicab correspondence analysis

2.3 Generalized taxicab singular value decomposition

Following Choulakian (2006a), GTSVD of a matrix Y with respect to the metricmatrices �I and �J is the representation (20) subject to (21, 22, 23); it is calculatedvia TSVD of Y in the following way:

Step 1: Compute TSVD of Y, that is:

Y =k∑

α=1

aαb′α/λα; (18)

Step 2: Define the following transformations:

aα = �I fα and bα = �J gα; (19)

Step 3: Substitute (19) in (18) and get

Y =k∑

α=1

�I fαg′α�J /λα; (20)

Step 4: Substitute (19) in (8, 9, 10) and get:

tα = sgn(fα) = sgn(�I fα) and sα = sgn(gα) = sgn(�J gα), (21)

||fα||1,�I= f ′

α�I tα = ||gα||1,�J= g′

α�J sα = λα for α = 1, . . . , k, (22)

f ′α�I tβ = g′

α�J sβ = 0 for α > β. (23)

Equation (22) says that the �I weighted L1 norm of fα is λα; likewise, Eq. (23) saysthat fα is �I orthogonal to tβ for α > β. It is important to note that tβ is in the dualspace of fβ . Note that when �I and �J are identity matrices, then GT SV D = T SV D.

Note that (21) is valid if and only if �I and �J are diagonal metric matrices, whichwe state it as a

Caveat: GTSVD is defined only for diagonal metric matrices; it corresponds to asimple change of scale of TSVD representation.

Hereafter we designate diagonal metric matrices by D.By comparing GSVD and GTSVD, we see that GSVD is a very general and ver-

satile matrix decomposition method with an excellent least squares global optimalityproperty, the famous Eckart–Young theorem; while GTSVD is an extremely restrictivematrix decomposition method. However, we have the following important

Lemma Let Y be a matrix and DI ,DJ , D∗I and D∗

J diagonal metric matrices; thenGTSVD of Y with respect to the metric matrices DI and DJ , and, GTSVD of Y withrespect to the metric matrices D∗

I and D∗J are equivalent.

123

V. Choulakian et al.

Proof Step 1 in (18) does not depend on the metrics, and it is common to both decom-positions. Diagonal metric matrices only change the scales of the factor scores.

Note that the GSVD representation of Y in (14) is different from the GTSVDrepresentation of Y in (20). In the next section we apply these results to four differentvariants of CA and their taxicab versions. �

3 Derivations

Let N = (ni j ) be a contingency table cross-classifying two nominal variables withI rows and J columns, and P = N/n be the associated correspondence matrix withelements pi j , where n = ∑J

j=1∑I

i=1 ni j is the sample size. We define as usual pi∗ =∑J

j=1 pi j , p∗ j = ∑Ii=1 pi j , the vector r = (pi∗) ∈ R

I , the vector c = (p∗ j ) ∈ RJ ,

and Dr = Diag(r) the diagonal matrix having diagonal elements pi∗, and similarlyDc = Diag(c). Let k = rank(Ym) for m = 1, . . . , 5.

3.1 Methods derived from GSVD

• CA of P corresponds to the GSVD of the matrix

Y1 = D−1r (P − rc′)D−1

c ,

= D−1r PD−1

c − 1I 1′J ,

with respect to the diagonal metrics Dr and Dc, see for instance Greenacre (1984).It considers the row and column categorical variables in symmetric way. 1I des-ignates a vector of ones of size I .

• NSCA of P corresponds to the GSVD of the matrix

Y2 = (P − rc′)D−1c ,

= PD−1c − r1′

J ,

with respect to the diagonal metrics Ir and Dc, where Ir is the identity matrix ofsize I, see among others, Takane and Jung (2009). It treats the column categoricalvariables as predictors, and the row categorical variables as responses.

• CA of adjusted residuals of P, introduced by Beh (2012), corresponds to the GSVDof the matrix

Y3 = D−1r (I − Dr )

−1/2(P − rc′)(I − Dc)−1/2D−1

c ,

= (I − Dr )−1/2(D−1

r PD−1c − 1I 1′

J )(I − Dc)−1/2,

= (I − Dr )−1/2Y1(I − Dc)

−1/2,

with respect to the diagonal metrics Dr and Dc. Beh’s argument in proposing CA ofadjusted residuals is that, according to Haberman (1973) or Agresti (2002), under theindependence assumption of the row and column variables the maximum likelihood

123

Some new aspects of taxicab correspondence analysis

estimate of the variance of√

n pi j is pi∗ p∗ j (1 − pi∗)(1 − p∗ j ) using a Poisson ormultinomial distribution.

• CA-raw of P, introduced by Greenacre (2010), corresponds to the GSVD of thematrix

Y4 = D−1ru

(

P − 1I

Ic′

)

D−1c ,

= I PD−1c − 1I 1′

J ,

with respect to the diagonal metrics Dru = Ir/I and Dc. Greenacre (2010) intro-duced this variant of CA for sites × species abundance data, and used the acronymCA-raw to designate it. His argument in proposing CA-raw was: Usually, in ecol-ogy (sites × species data), the sites are of equal size; so CA-raw compares thespecies profiles with the uniform profiles across sites.

The four variants of CA of P derived above are not equivalent: knowing one ofthem we can not obtain the other.

3.2 Methods derived from GTSVD

• TCA of P corresponds to the GTSVD of the matrix

Y5 = (P − rc′),

with respect to the diagonal metrics Dr and Dc. By (20), we get

P = rc′ +k∑

α=1

Dr fαg′αDc/λα,

= Dr

(

1I 1′J +

k∑

α=1

fαg′α/λα

)

Dc,

or elementwise,

pi j = pi. p. j

[

1 +k∑

α=1

fα(i)gα( j)/λα

]

, (24)

which is the data reconstruction formula in both CA and TCA.• TNSCA of P corresponds to the GTSVD of the matrix

Y5 = (P − rc′),

123

V. Choulakian et al.

with respect to the diagonal metrics Ir and Dc. By (20) we get

P = rc′ +k∑

α=1

aαg′αDc/λα,

=(

r1′J +

k∑

α=1

aαg′α/λα

)

Dc,

which is the data reconstruction formula in both NSCA and TNSCA; it can berewritten in the more familiar form as

PD−1c − r1′

J =k∑

α=1

aαg′α/λα,

or elementwise,

(pi j/p∗ j − pi∗) =k∑

α=1

aα(i)gα( j)/λα. (25)

• TCA of adjusted residuals of P corresponds to the GTSVD of the matrix

Y6 = (I − Dr )−1/2(P − rc′)(I − Dc)

−1/2,

with respect to the diagonal metrics Dr and Dc. By (20) we get

(I − Dr )−1/2(P − rc′)(I − Dc)

−1/2=k∑

α=1

Dr fαg′αDc/λα,

which is equivalent to

D−1r (I − Dr )

−1/2(P − rc′)(I − Dc)−1/2D−1

c =k∑

α=1

fαg′α/λα,

or elementwise,

(pi j − pi∗ p∗ j )

pi∗ p∗ j√

(1 − pi∗)(1 − p∗ j )=

k∑

α=1

fα(i)gα( j)/λα,

which is the data reconstruction formula in both CA and TCA of adjusted residuals.

• TCA-raw of P corresponds to the GTSVD of the matrix

Y7 =(

P − 1I

Ic′

)

,

123

Some new aspects of taxicab correspondence analysis

with respect to the diagonal metrics Dru = Ir/I and Dc. By (20) we get

P = 1I

Ic′ +

k∑

α=1

Drufαg′αDc/λα,

= Dru

(

1I 1′J +

k∑

α=1

fαg′α/λα

)

Dc,

which can be rewritten as

D−1ru PD−1

c − 1I 1′J =

k∑

α=1

fαg′α/λα,

or elementwise,

(I pi j/p∗ j − 1) =k∑

α=1

fα(i)gα( j)/λα.

3.3 Remarks

First, by the Lemma, TCA of P and TNSCA of P are equivalent: knowing one ofthem we can obtain the other; for example, in (24) and (25), the terms gα and λα arecommon to both methods, and fα = D−1

r aα for α = 1, . . . , k.

Second, TCA-raw of P, TCA of adjusted residuals of P and TCA of P are notequivalent.

Third, TCA stays as close as possible to the original data: It directly acts on thecorrespondence matrix P in the sense that the basic taxicab decomposition is indepen-dent of the metrics: it is simply constructed from a sum of the signed columns or rows,because the normed principal axes are u ∈ {−1,+1}J and v ∈ {−1,+1}I ; only therelative direction of the rows or columns is taken into account without calculating asimilarity (or dissimilarity) measure between the rows or columns. While in the com-putation of CA the normed principal axes u and v are the eigenvectors of a similaritymeasure between the rows or columns and more importantly the similarity measuredepends on the chosen metric. For instance, in the case of CA, in Step 1 we calculatethe matrix of Pearson residuals, see Beh (2012),

R = D−1/2r (P − rc′)D−1/2

c ,

then, in Step 2, we calculate the normed principal axes uα via the eigen-equation,

R′Ruα = λαuα,

where the (i, j)th element of R′R represents a similarity measure between the twocolumn variables i and j .

123

V. Choulakian et al.

4 Distance and geometric aspects of TCA

An important property of TCA and CA is that columns (or rows) with identical profiles(conditional probabilities) receive identical factor scores. Similarly in TNSCA andNSCA columns with identical profiles (conditional probabilities) receive identicalfactor scores. The factor scores are used in the graphical displays. Moreover, mergingof identical profiles do not change the results of the data analysis; this is namedthe principle of equivalent partitioning by Nishisato (1984); it includes the famousdistributional equivalence property, which is satisfied by CA. Here, we focus on newgeometric and distance aspects of TCA and TNSCA, in comparison with the knownresults of CA and NSCA. We do not discuss data coding, which is a crucial preliminarystep in CA and TCA, see in particular Murtagh (2005, ch. 3).

First, the underlying geometry in CA and in its variants is Euclidean with twoquadratic forms; distance interpretation of the graphical displays are based on theorthogonality of the factor scores and the Pythagorian theorem. No such interpreta-tion exists in the taxicab approach. Benzécri (1973, p. 31–32) based the data analysison the choice of two metrics defined on the row and column spaces within the Euclid-ean geometry framework. For instance, in CA the Chi-squared distance between twoprofiles equals the ordinary Euclidean distance squared of the factors

J∑

j=1

(pi j

pi∗− pi1 j

pi1∗

)2

/p∗ j =k∑

α=1

( fα(i) − fα(i1))2, (26)

andI∑

i=1

(pi j

p∗ j− pi j1

p∗ j1

)2

/pi∗ =k∑

α=1

(gα( j) − gα( j1))2. (27)

Similarly, the Chi-squared distance between a profile and its center of gravity is:

J∑

j=1

(pi j

pi∗− p∗ j

)2

/p∗ j =k∑

α=1

fα(i)2, (28)

andI∑

i=1

(pi j

p∗ j− pi∗

)2

/pi∗ =k∑

α=1

gα( j)2. (29)

The taxicab analog of these four equations are the following four inequalities obtainedby (21, 22, 23 and 24),

| f1(i) − f1(i1)| ≤J∑

j=1

∣∣∣∣

pi j

pi∗− pi1 j

pi1∗

∣∣∣∣ ≤

k∑

α=1

| fα(i) − fα(i1)|, (30)

|g1( j) − g1( j1)| ≤I∑

i=1

∣∣∣∣

pi j

p∗ j− pi j1

p∗ j1

∣∣∣∣ ≤

k∑

α=1

|gα( j) − gα( j1)|, (31)

123

Some new aspects of taxicab correspondence analysis

| f1(i)| ≤J∑

j=1

∣∣∣∣

pi j

pi∗− p∗ j

∣∣∣∣ ≤

k∑

α=1

| fα(i)|, (32)

and

|g1( j)| ≤I∑

i=1

∣∣∣∣

pi j

p∗ j− pi∗

∣∣∣∣ ≤

k∑

α=1

|gα( j)|. (33)

Equations (30) through (33) show that the taxicab distance between any two pointsdepends on the chosen axes, a well-known property of taxicab geometry: In fact, thisproperty essentially distinguishes taxicab geometry from the Euclidean geometry.

In the case of NSCA, the distance between two rows is the weighted Euclideandistance

J∑

j=1

p∗ j

(pi j

p∗ j− pi∗ − pi1 j

p∗ j+ pi1∗

)2

=k∑

α=1

(aα(i) − aα(i1))2, (34)

and the distance between two column profiles is the unweighted Euclidean distance

I∑

i=1

(pi j

p∗ j− pi j1

p∗ j1

)2

=k∑

α=1

(gα( j) − gα( j1))2. (35)

The taxicab NSCA analog of these two equations are the following two inequalitiesobtained by (21, 22, 23 and 25)

|a1(i) − a1(i1)| ≤J∑

j=1

∣∣∣∣

(pi j

p∗ j− pi∗ − pi1 j

p∗ j+ pi1∗

)∣∣∣∣ ≤

k∑

α=1

|aα(i) − aα(i1)|

and (31).

5 Conclusion

In this paper we discussed some new additional aspects of TCA. In particular, weshowed that TCA and TNSCA are equivalent, because of the taxicab decomposition.We also showed that TCA stays as close as possible to the original correspondencematrix without calculating a dissimilarity (or similarity) measure between rows orcolumns. Further, we discussed some distance and geometric aspects of TCA andTNSCA. Because the geometries of CA and TCA are different, they can producedifferent results; see for instance Choulakian (2006a, 2008a,b); Choulakian et al.(2006, 2013), and Choulakian and de Tibeiro (2013).

Based on our experience, we suggest the analysis of a data set by both methodsCA and TCA: Like a cubist painting where an object is painted from different angles,each method sees the data from its point of view, sometimes the views are similar andat other times not similar. Similar to projection pursuit methods, we desire interesting

123

V. Choulakian et al.

projected views of the data. So by applying both methods we will get two projectedviews of the data; and consequently, we think that CA and TCA complement andenrich each other. Fichet (2009) described TCA as a general scoring method. However,Choulakian (2013) stated that for many data sets properly coded, TCA is intimatelyrelated to the sum score statistic. For instance, in multiple TCA of the 0–1 indicatormatrix of Q polytomous variables, the first factor score of the individuals can alwaysbe interpreted as the sum of Q Bernoulli variables; for more details refer to Choulakianet al. (2013). A program written in MATLAB is available from the authors.

Acknowledgments The authors are grateful to the editor, Pr. A. Cerioli, associate editor, and the tworeviewers for their constructive comments, which improved the presentation of the paper. V. Choulakian’sresearch is financed by NSERC of Canada.

Appendix

Singular value decomposition

Let X be a data set of dimension I × J , where I observations are described by the Jvariables. The ordinary SVD can be described as successive maximization of the L2-norm of the linear combination of the columns of X subject to a quadratic constraint;that is, it is based on the following optimization problem

max ||Xu||2 subject to ||u||2 = 1; (36)

or equivalently, it can also be described as maximization of the the L2-norm of thelinear combination of the rows of X

max∣∣∣∣X′v

∣∣∣∣2 subject to ||v||2 = 1. (37)

Equation (36) is the dual of (37), and they can be reexpressed as matrix norms

λ1 = maxu∈R

J

||Xu||2||u||2 ,

= maxv∈R

I

∣∣∣∣X′v

∣∣∣∣2

||v||2 ,

= maxu∈R

J ,v∈RI

v′Xu||u||2 ||v||2 . (38)

The solution to (38), λ1, is the square root of the greatest eigenvalue of the matrixX′X or XX′. The first principal axes, u1 and v1, can be defined as

u1 = arg maxu

||Xu||2 such that ||u1||2 = 1, (39)

123

Some new aspects of taxicab correspondence analysis

where u1 is the eigenvector of the matrix X′X associated with the greatest eigenvalueλ2

1; andv1 = arg max

v

∣∣∣∣X′v

∣∣∣∣2 such that ||v1||2 = 1. (40)

Let a1 and b1 be defined as

a1 = Xu1 and b1 = X′v1; (41)

then||a1||2 = v′

1a1 = ||b1||2 = u′1b1 = λ1. (42)

Equations (41) and (42) are named transition formulas, because v1 and a1, and, u1and b1, are related by

u1 = b1/λ1 and v1 = a1/λ1. (43)

To obtain a2 and b2, and axes u2 and v2, we repeat the above procedure on the residualdataset

X2= X1−a1b′1/λ1, (44)

where X1= X. We note that rank(X2) =rank(X1)−1, because by (41) and (42)

X2u1 = 0 and X′2v1 = 0. (45)

Classical SVD can be described as the sequential repetition of the above procedurefor k = rank(X) times till the residual matrix becomes 0; thus, using α = 1, . . . , kas indices, the matrix X can be written as

X =k∑

α=1

aαb′α/λα, (46)

which, by (43), can be rewritten in a much more familiar form

X =k∑

α=1

λαvαu′α. (47)

Further, we have

λα = ||aα||2 = ||bα||2 and λα’s are decreasing for α = 1, . . . , k; (48)

and

T r(X′X) = T r(XX′) =k∑

α=1

λ2α, (49)

=k∑

α=1

||aα||22 =k∑

α=1

||bα||22 .

123

V. Choulakian et al.

Taxicab singular value decomposition

TSVD consists of maximizing the L1 norm of the linear combination of the columnsof X subject to the L∞ norm constraint; more precisely, it is based on the followingoptimization problem

max ||Xu||1 subject to ||u||∞ = 1; (50)

or equivalently, it can also be described as maximization of the L1 norm of the linearcombination of the rows of the matrix X

max∣∣∣∣X′v

∣∣∣∣1 subject to ||v||∞ = 1. (51)

Equation (50) is the dual of (51), and they can be reexpressed as matrix norms

λ1 = maxu∈R

J

||Xu||1||u||∞ ,

= maxv∈R

I

∣∣∣∣X′v

∣∣∣∣1

||v||∞ ,

= maxu∈R

J ,v∈RI

v′Xu||u||∞ ||v||∞ , (52)

which is a well known and much discussed matrix norm related to Grothendieck prob-lem, see for instance, Alon and Naor (2006). The solution to (52), λ1, is a combinatorialoptimization problem given by

max ||Xu||1 subject to u ∈ {−1,+1}J . (53)

Equation (53) characterizes the robustness of the method, in the sense that the weightsaffected to the columns (similarly to the rows by duality) are uniformly distributed on{−1,+1}. The vectors, u1 and v1, are defined as

u1 = arg maxu

||Xu||1 such that ||u1||∞ = 1, (54)

andv1 = arg max

v

∣∣∣∣X′v

∣∣∣∣1 such that ||v1||∞ = 1. (55)

Let a1 and b1 bea1 = Xu1 and b1 = X′v1; (56)

then||a1||1 = v′

1a1 = ||b1||1 = u′1b1 = λ1. (57)

Equations (56) and (57) are named transition formulas, because v1 and a1, and, u1and b1, are related by

u1 = sgn(b1) and v1 = sgn(a1), (58)

123

Some new aspects of taxicab correspondence analysis

where sgn(g1) = (sgn(g1(1)), . . . , sgn(g1(J ))′, and sgn(g1( j)) = 1 if g1( j) > 0,

sgn(g1( j)) = −1 otherwise. Note that (58) is completely different from (43).To obtain a2, b2, and axes u2 and v2, we repeat the above procedure on the residual

dataset

X2 = X1−X1u1v′1X1/λ1,

= X1−a1b′1/λ1 (59)

where X1= X. We note that rank(X2) =rank(X1)−1, because by (56), (57) and (58)

X2u1 = 0 and X′2v1 = 0; (60)

which implies that

u′1bα = 0 and v′

1aα = 0 for α = 2, . . . , k. (61)

TSVD is described as the sequential repetition of the above procedure for k = rank(X)

times till the residual matrix becomes 0; thus the matrix X can be written as

X =k∑

α=1

aαb′α/λα. (62)

It is important to note that (62) has the same form as (4) and (46), but it can not berewritten as (47).

Further, similar to (57), we have

λα = ||aα||1 = ||bα||1 for α = 1, . . . , k. (63)

But the dispersion measures λα’s in (63) will not satisfy (49), because the Pythagoreantheorem is not satisfied in L1.

In TSVD, the optimization problem (50), (51) or (52) can be accomplished by twoalgorithms. The first one is based on complete enumeration (53); this can be applied,with the present state of desktop computing power, say, if min(I, J ) � 25. The secondone is based on iterating the transition formulas (56), (57) and (58), similar to Wold(1966) NIPALS (nonlinear iterative partial alternating least squares) algorithm, alsonamed criss-cross regression by Gabriel and Zamir (1979). It is easy to show thatthis is also an ascent algorithm. The criss-cross algorithm is nonlinear and can besummarized in the following way, where b is a starting value:

Step 1: u =sgn(b), a = Xu and λ(u) = ||Xu||1 ;Step 2: v =sgn(a), b = X′v and λ(v) = ∣

∣∣∣X′v

∣∣∣∣1 ;

Step 3: If λ(v)−λ(u) >0, go to Step 1; otherwise, stop.This is an ascent algorithm; that is, it increases the value of the objective function

λ at each iteration. The convergence of the algorithm is superlinear (very fast, at mosttwo iterations); however it could converge to a local maximum; so we restart thealgorithm I times using each row of X as a starting value. The iterative algorithm

123

V. Choulakian et al.

is statistically consistent in the sense that as the sample size increases there will besome observations in the direction of the principal axes, so the algorithm will find theoptimal solution.

References

Agresti A (2002) Categorical data analysis, 2nd edn. Wiley, New JerseyAlon N, Naor A (2006) Approximating the cut-norm via Grothendieck’s inequality. SIAM J Comput 35:787–

803Balbi S (1998) Graphical displays in nonsymmetrical correspondence analysis. In: Blasius J, Greenacre M

(eds) Visualization of categorical data. Academic Press, London, pp 297–309Beh E (2012) Simple correspondence analysis using adjusted residuals. J Stat Plan Inference 142:965–973Benzécri JP (1973) L’Analyse des Données. In: L’Analyse des Correspondances, vol 2, Dunod, ParisBenzécri JP (1992) Correspondence analysis handbook. Marcel Dekker, New YorkChoulakian V (2003) The optimality of the centroid method. Psychometrika 68:473–475Choulakian V (2005) Transposition invariant principal component analysis in L1 for long tailed data. Stat

Probab Lett 71:23–31Choulakian V (2006a) Taxicab correspondence analysis. Psychometrika 71:333–345Choulakian V (2006b) L1 norm projection pursuit principal component analysis. Comput Stat Data Anal

50:1441–1451Choulakian V (2008a) Taxicab correspondence analysis of contingency tables with one heavyweight col-

umn. Psychometrika 73:309–319Choulakian V (2008b) Multiple taxicab correspondence analysis. Adv Data Anal Classif 2:177–206Choulakian V, de Tibeiro J (2013) Graph partitioning by correspondence analysis and taxicab correspon-

dence analysis. J Classif 30(3):397–427Choulakian V, Allard J, Simonetti B (2013) Multiple taxicab correspondence analysis of a survey related

to health services. J Data Sci 11(2):205–229Choulakian V, Kasparian S, Miyake M, Akama H, Makoshi N, Nakagawa M (2006) A statistical analysis

of synoptic gospels. In: Viprey JR (ed) Proceedings of 8th international conference on textual data.JADT’2006, Press Universitaires de Franche-Comté, pp 281–288

Choulakian V (2013) The simple sum score statistic in taxicab correspondence analysis. In: Brentari E,Carpita M (eds) eBook, Advances in latent variables, Vita e Pensiero, Milan, Italy, ISBN:9788834325568

Fichet B (2009) Metrics of Lp-type and distributional equivalence principle. Adv Data Anal Classif 3:305–314

Gabriel KR, Zamir S (1979) Lower rank approximation of matrices by least squares with any choice ofweights. Technometrics 21:489–498

Gifi A (1990) Nonlinear multivariate analysis. Wiley, New YorkGreenacre M (1984) Theory and applications of correspondence analysis. Academic Press, LondonGreenacre M (2010) Correspondence analysis of raw data. Ecology 91(4):958–963Haberman SJ (1973) The analysis of residuals in cross-classified tables. Biometrics 75:457–467Kreyszig E (1978) Introduction to functional analysis with applications. Wiley, New YorkLauro NC, D’Ambra L et al (1984) L’analyse non symétrique des correspondances. In: Diday E et al (eds)

Data analysis and informatics. Amsterdam, North Holland, pp 433–446Le Roux B, Rouanet H (2004) Geometric data analysis. From correspondence analysis to structured data

analysis. Kluwer–Springer, DordrechtMurtagh F (2005) Correspondence analysis and data coding with Java and R. Chapman & Hall/CRC, Boca

RatonNishisato S (1984) Forced classification: a simple application of a quantification method. Psychometrika

49(1):25–36Nishisato S (1994) Elements of dual scaling: an introduction to practical data analysis. Lawrence Erlbaum,

HillsdaleTakane Y, Jung S (2009) Tests of ignoring and eliminating in nonsymmetric correspondence analysis. Adv

Data Anal Classif 3(3):315–340Wold H (1966) Estimation of principal components and related models by iterative least squares. In: Krish-

naiah PR (ed) Multivariate analysis. Academic Press, New York, pp 391–420

123

top related