correspondence analysis ahmed rebai center of biotechnology of sfax
TRANSCRIPT
![Page 1: Correspondence Analysis Ahmed Rebai Center of Biotechnology of Sfax](https://reader034.vdocuments.net/reader034/viewer/2022051416/56649f1e5503460f94c35c6c/html5/thumbnails/1.jpg)
Correspondence Analysis
Ahmed Rebai
Center of Biotechnology of Sfax
![Page 2: Correspondence Analysis Ahmed Rebai Center of Biotechnology of Sfax](https://reader034.vdocuments.net/reader034/viewer/2022051416/56649f1e5503460f94c35c6c/html5/thumbnails/2.jpg)
Correspondance analysis
Introduced by Benzecri (1973) For uncovering and understanding the structure and pattern in data in contingency tables.
Involves finding coordinate values which represent the row and column categories in some optimal way
![Page 3: Correspondence Analysis Ahmed Rebai Center of Biotechnology of Sfax](https://reader034.vdocuments.net/reader034/viewer/2022051416/56649f1e5503460f94c35c6c/html5/thumbnails/3.jpg)
Contingency tables Table with r rows and c columns
X1 1 ……….. j ………… c Total
X2
12..i.r
N11
N21
.
.
.
.
Nr1
N1j
Nij
N1c
Ncr
N1.
Nr.
Total N.1 N.j N.c N..
![Page 4: Correspondence Analysis Ahmed Rebai Center of Biotechnology of Sfax](https://reader034.vdocuments.net/reader034/viewer/2022051416/56649f1e5503460f94c35c6c/html5/thumbnails/4.jpg)
Main idea Develop simple indices that will show
us the relation between rows and columns
Indices that tell us simultaneously which columns have more wheights in a row category and vice versa
Reduce dimensionality like PCA Indice are extracted in decreasing
order of imporance
![Page 5: Correspondence Analysis Ahmed Rebai Center of Biotechnology of Sfax](https://reader034.vdocuments.net/reader034/viewer/2022051416/56649f1e5503460f94c35c6c/html5/thumbnails/5.jpg)
Which crietria? In contigency table global
independence between the two variables is generally measured by a chi-square (²) calculated as:
Where Eij are expected count under independence
r
i
c
j ij
ijij
E
EN
1 1
22
)(
....
N
NNE jiij
![Page 6: Correspondence Analysis Ahmed Rebai Center of Biotechnology of Sfax](https://reader034.vdocuments.net/reader034/viewer/2022051416/56649f1e5503460f94c35c6c/html5/thumbnails/6.jpg)
Decomposition of ² We have a departure from
indepedence and we want to know why To find the factors we use the matrix C
of dimension (r xc ) with elements
ij
ijijij
E
ENc
)(
![Page 7: Correspondence Analysis Ahmed Rebai Center of Biotechnology of Sfax](https://reader034.vdocuments.net/reader034/viewer/2022051416/56649f1e5503460f94c35c6c/html5/thumbnails/7.jpg)
How to find factors? Singular value decomposition (SVD)
of matrix C that is find matrice U, D and V such that
C=U D VT U are eigenvectors of CCT V eigenvectors of CTC D a diagonal matrix of where k
are eigenvalues of CCT k=Rank(C)<Min(r-1,c-1)
k
![Page 8: Correspondence Analysis Ahmed Rebai Center of Biotechnology of Sfax](https://reader034.vdocuments.net/reader034/viewer/2022051416/56649f1e5503460f94c35c6c/html5/thumbnails/8.jpg)
Tr(CCT)= k = ²= cij²
The projections of the rows and the columns are given by the eigenvectors Uk and Vk
C Uk = Vk
CTVk = Uk
k
k
![Page 9: Correspondence Analysis Ahmed Rebai Center of Biotechnology of Sfax](https://reader034.vdocuments.net/reader034/viewer/2022051416/56649f1e5503460f94c35c6c/html5/thumbnails/9.jpg)
How many factors? The adequacy of representation by
the two first coordinates is measured by the % of explained inertia
(1+2)/ k In general a display on (U1,U2) of
rows and (V1,V2) of columns The proximity between rows and
columns points is to be interpreted
![Page 10: Correspondence Analysis Ahmed Rebai Center of Biotechnology of Sfax](https://reader034.vdocuments.net/reader034/viewer/2022051416/56649f1e5503460f94c35c6c/html5/thumbnails/10.jpg)
CA in practice Proximity of two rows (columns)
indicates a similar profile that is similar conditional frequency distribution: the two rows (columns) are proportional
The orignin is the average of the factor; so a point (row or column) close to the origin indicates an average profile
Proximity of a row to a column indicates that this row has particularly important wheight in this column (if far from origin)
![Page 11: Correspondence Analysis Ahmed Rebai Center of Biotechnology of Sfax](https://reader034.vdocuments.net/reader034/viewer/2022051416/56649f1e5503460f94c35c6c/html5/thumbnails/11.jpg)
A first example: French Bac
![Page 12: Correspondence Analysis Ahmed Rebai Center of Biotechnology of Sfax](https://reader034.vdocuments.net/reader034/viewer/2022051416/56649f1e5503460f94c35c6c/html5/thumbnails/12.jpg)
Eigenvalues
![Page 13: Correspondence Analysis Ahmed Rebai Center of Biotechnology of Sfax](https://reader034.vdocuments.net/reader034/viewer/2022051416/56649f1e5503460f94c35c6c/html5/thumbnails/13.jpg)
With Corsica
![Page 14: Correspondence Analysis Ahmed Rebai Center of Biotechnology of Sfax](https://reader034.vdocuments.net/reader034/viewer/2022051416/56649f1e5503460f94c35c6c/html5/thumbnails/14.jpg)
Without Corsica
Classicalbac
Technicalbac
![Page 15: Correspondence Analysis Ahmed Rebai Center of Biotechnology of Sfax](https://reader034.vdocuments.net/reader034/viewer/2022051416/56649f1e5503460f94c35c6c/html5/thumbnails/15.jpg)
Coefficients for regions
![Page 16: Correspondence Analysis Ahmed Rebai Center of Biotechnology of Sfax](https://reader034.vdocuments.net/reader034/viewer/2022051416/56649f1e5503460f94c35c6c/html5/thumbnails/16.jpg)
Coefficients for Bac Type
![Page 17: Correspondence Analysis Ahmed Rebai Center of Biotechnology of Sfax](https://reader034.vdocuments.net/reader034/viewer/2022051416/56649f1e5503460f94c35c6c/html5/thumbnails/17.jpg)
Properties of CA Allows consideration of dummy variables
(called ‘illustrative variables’), as additional variables which do not contribute to the construction of the factorial space, but can be displayed on this factorial space.
With such a representation it is possible to determine the proximity between observations and variables and the illustrative variables and observations.
![Page 18: Correspondence Analysis Ahmed Rebai Center of Biotechnology of Sfax](https://reader034.vdocuments.net/reader034/viewer/2022051416/56649f1e5503460f94c35c6c/html5/thumbnails/18.jpg)
Tekaia and yeramian (2006) 208 predicted proteomes representing
the three phylogenetic domains and various lifestyle (hyperthromphile, thermophiles, psychrofile and mesophiles including eukaryotes)
Variables: amino-acid composition of proteomes
Illustrative variables:groups of amino-acids (charged, polar, hydrophobic)
![Page 19: Correspondence Analysis Ahmed Rebai Center of Biotechnology of Sfax](https://reader034.vdocuments.net/reader034/viewer/2022051416/56649f1e5503460f94c35c6c/html5/thumbnails/19.jpg)
Why CA? To analyze distribution of species
in terms of global properties and discriminated groups
Search for amino-acid signature in groups of species
Try to understand potential evolutionary trends
![Page 20: Correspondence Analysis Ahmed Rebai Center of Biotechnology of Sfax](https://reader034.vdocuments.net/reader034/viewer/2022051416/56649f1e5503460f94c35c6c/html5/thumbnails/20.jpg)
![Page 21: Correspondence Analysis Ahmed Rebai Center of Biotechnology of Sfax](https://reader034.vdocuments.net/reader034/viewer/2022051416/56649f1e5503460f94c35c6c/html5/thumbnails/21.jpg)
Results
First axis (63%) correspond to GC contents (Mycoplasma (23%) to Streptomyces(72%))
Second axis (14%) correspond to optimals growth temperature
![Page 22: Correspondence Analysis Ahmed Rebai Center of Biotechnology of Sfax](https://reader034.vdocuments.net/reader034/viewer/2022051416/56649f1e5503460f94c35c6c/html5/thumbnails/22.jpg)
![Page 23: Correspondence Analysis Ahmed Rebai Center of Biotechnology of Sfax](https://reader034.vdocuments.net/reader034/viewer/2022051416/56649f1e5503460f94c35c6c/html5/thumbnails/23.jpg)
![Page 24: Correspondence Analysis Ahmed Rebai Center of Biotechnology of Sfax](https://reader034.vdocuments.net/reader034/viewer/2022051416/56649f1e5503460f94c35c6c/html5/thumbnails/24.jpg)