![Page 1: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/1.jpg)
MULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6th, 2017
Presented by: Ana Cuciureanu and Shamina S Prova BIOL 5081: Introduction to Biostatistics Professor: Dr. Joel Shore
![Page 2: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/2.jpg)
Multivariate Analysis (MVA) • Complex systems require multiple and different kind of measurements to be taken in order to best describe reality
• MVA is the investigation of many variables, simultaneously, in order to understand the relationships that may exist between variables
• MVA can be as simple as analysing two variables right up to millions
2 Introduction
![Page 3: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/3.jpg)
Multivariate Analysis (MVA) • Is the study of variability and its sources
• Shows the influence of both, wanted and unwanted variability
• Wanted the effect of variables on the relationship between data points
• Unwanted random variability resulting from experimental features that cannot be controlled
• Used to predict future events
3 Introduction
![Page 4: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/4.jpg)
Types of MVA • Exploratory Data Analysis (EDA)
• Deeper insight into large, complex data sets
• i.e. Principle Component Analysis, Cluster Analysis
• Regression Analysis
• Classification • Identifies new or existing classes
• i.e. Cluster Analysis
4 Introduction
![Page 5: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/5.jpg)
MVA vs Classical Statistics • How would you analyze 50 rows and 10 columns of data? • Plot columns together two at a time
• Plot each variable for all samples and look for trends
• This univariate analysis is too simplistic, frustrating and fails to detect the relationship between variants (i.e. covariance and correlation)
5 Introduction
![Page 6: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/6.jpg)
Benefits of MVA • Identifies variables that contribute most to the overall variability in the data
• Helps isolate those variables that co-vary with each other
• A picture is worth a thousand words and helps understanding the data
6 Introduction
![Page 7: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/7.jpg)
Benefits of MVA
7 Introduction
![Page 8: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/8.jpg)
Applications of MVA • Pharmaceutical and biotechnological tests
• Agricultural analysis
• Business intelligence and marketing
• Spectroscopic applications
• Genetics and metabolism
• Etc.
8 Introduction
![Page 9: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/9.jpg)
Cell Example • Imagine you have a dish with a bunch of different cells, but you don’t know how they are characterized
• You decide that the best way to characterize these cell is to measure the mRNA expression of multiple genes
• However, there are too many measurements
• Conclusion: have to use MVA
9 Introduction
![Page 10: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/10.jpg)
Cell Example cells = subjects genes = variables
Gene cell 1 cell 2 cell 3 cell 4 cell 5 cell 6 cell 7 cell 8 cell 9 cell 10
a 12 8 12 8 20 8 8 20 8 24 b 28 28 28 28 0 8 16 12 20 16 c 16 16 16 12 16 16 16 16 16 12 d 20 20 20 20 8 20 20 8 24 8 e 28 24 24 24 4 12 8 20 12 8 f 4 32 12 12 28 0 8 16 16 4 g 18 12 18 12 30 12 12 30 12 36 h 42 42 42 42 0 12 24 18 30 24 i 24 24 24 18 24 24 24 24 24 18 j 30 30 30 30 12 30 30 12 36 12 k 8 12 8 8 20 0 4 28 4 20 l 7 7 7 7 0 2 4 3 5 4
m 8 8 8 6 8 8 8 8 8 6 n 15 15 15 15 12 30 30 12 36 12 o 21 21 21 21 0 6 12 9 15 12
10 Introduction
![Page 11: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/11.jpg)
PCA and Cluster Analysis Gene cell 1 cell 2 cell 3 cell 4 cell 5 cell 6 cell 7 cell 8 cell 9 cell 10
a 12 8 12 8 20 8 8 20 8 24 b 28 28 28 28 0 8 16 12 20 16 c 16 16 16 12 16 16 16 16 16 12 d 20 20 20 20 8 20 20 8 24 8 e 28 24 24 24 4 12 8 20 12 8 f 4 32 12 12 28 0 8 16 16 4 g 18 12 18 12 30 12 12 30 12 36 h 42 42 42 42 0 12 24 18 30 24 i 24 24 24 18 24 24 24 24 24 18 j 30 30 30 30 12 30 30 12 36 12 k 8 12 8 8 20 0 4 28 4 20 l 7 7 7 7 0 2 4 3 5 4
m 8 8 8 6 8 8 8 8 8 6 n 15 15 15 15 12 30 30 12 36 12 o 21 21 21 21 0 6 12 9 15 12
Cells Factor 1 Factor 2 cell1 0.93549 0.07226 cell2 0.81307 -0.0181 cell3 0.96315 0.0835 cell4 0.95191 -0.0752 cell5 -0.33566 0.75692 cell6 0.60245 0.0404 cell7 0.8073 0.0129 cell8 -0.01531 0.95305 cell9 0.8369 -0.07732 cell10 0.18234 0.85819
X axis
Y ax
is
PCA
Cluster Analysis
11 Introduction
![Page 12: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/12.jpg)
PRINCIPAL COMPONENT ANALYSIS Reduction of dimension
12
![Page 13: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/13.jpg)
1 dimension number line
Transcription from single cell
Gene mRNA count
a 12
b 5
c 16
d 8
e 7
f 4
13 Principle Component Analysis
![Page 14: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/14.jpg)
Two dimension graph
a
b
c d
e
f
02468
101214161820
0 5 10 15 20ce
ll 2
cell 1
2-D graph of two cells transcription profile
• So for 3 cells….it requires a 3-D data graph.
• For 4 cells…4 dimensional…which is not possible to draw on paper
• For 1000 cells…..1000-D (Impossible!)
Gene Cell1 Cell2
a 12 18
b 5 9
c 16 13
d 8 14
e 7 10
f 4 7
14 Principle Component Analysis
![Page 15: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/15.jpg)
Cell 1
Cel
l 2
Cel
l 2
Cell 1
Flattening of data. One axis showing most variability.
Principal Component Determination 15 Principle Component Analysis
![Page 16: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/16.jpg)
Principal Component Analysis • PCA compresses(flattens) multidimensional data(multiple
cell) into 2 or 3 dimensions which provides meaningful interpretation about the maximum variance in the data set.
• Flattening a Z stack of microscope images to make a 2-D image for paper.
16 Principle Component Analysis
![Page 17: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/17.jpg)
a
b
c d
e
f
g
h i
j k
l
0
2
4
6
8
10
12
14
16
18
20
0 2 4 6 8 10 12 14 16 18ce
ll 2
cell 1
2-D graph of two cells transcription profile
PC1: Most variation axis
PC2: the 2nd most variation axis
PC2
PC1
Principal Component Analysis
Gene Cell1 Cell2
a 12 18
b 5 9
c 16 13
d 8 14
e 7 10
f 4 7
g 10 12
h 15 17
i 16 18
j 14 15
k 12 14
l 9 11
17 Principle Component Analysis
![Page 18: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/18.jpg)
a
b
c d
e
f
g
h i
j k
l
0
2
4
6
8
10
12
14
16
18
20
0 2 4 6 8 10 12 14 16 18
cell
2
cell 1
PC2
PC3
PC1
For 500 cells….500 Principal component!
Principal Component Analysis
18 Principle Component Analysis
![Page 19: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/19.jpg)
a
b
c d
e
f
g
h i
j k
l
0
2
4
6
8
10
12
14
16
18
20
0 2 4 6 8 10 12 14 16 18ce
ll 2
cell 1
PC2 PC1
Variability extent on each PC Loading: Influence of each gene on the PC. Eigenvalue and Eigenvector: an array of loading for a PC with direction of the influence.
Gene Cell1 Cell2
a 12 18
b 5 9
c 16 13
d 8 14
e 7 10
f 4 7
g 10 12
Gene Influence on PC1
In value PC1
Influence on PC2
In value PC2
a Medium 5 High 3
b High -9 Low -0.1
c High 9 High -3.5
d Low -3 High 2
e Medium -6 Low -0.5
f High -11 Medium -1
g Low -0.5 Low -0.5
19 Principle Component Analysis
![Page 20: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/20.jpg)
Variability Scoring
Score cell based on transcription level and influence on each principal component: Cell1 PC1 =Σ(no. of expression of gene * respective influence on PC1) =(12*5)+(5*-9)+(16*9)+(8*-3)+……. =1 Cell1 PC2 =(18*3)+(9*-0.1)+(13*-3.5)+(14*2)+…… =1
Gene Cell1 Cell2
a 12 18
b 5 9
c 16 13
d 8 14
e 7 10
f 4 7
g 10 12
Gene Influence on PC1
In value PC1
Influence on PC2
In value PC2
a Medium 5 High 3
b High -9 Low -0.1
c High 9 High -3.5
d Low -3 High 2
e Medium -6 Low -0.5
f High -11 Medium -1
g Low -0.5 Low -0.5
20 Principle Component Analysis
![Page 21: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/21.jpg)
Cell1
0
2
4
6
8
10
12
0 2 4 6 8 10
PC2
PC1
Cell2
Cell3
Cell6 Cell5
Cell4
Plotting PC2 against PC1
Cell PC1 PC2
Cell 1 1 1
Cell 2 0.8 9.5
Cell 3 5.5 7.5
Cell 4 1.5 9
21 Principle Component Analysis
![Page 22: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/22.jpg)
Mathematical representation • X= [ ]nxm where X is the data matrix, with n no. of samples and m no. of measurements.
• PCA=Eigendecomposition, XTX=W where W is
the eigenvalues with eigenvectors (mXm matrix) and X
T is the X transpose matrix.
• T=XW where T is the score (nXm matrix).
Characteristics of W is such that each column is a PC and the eigenvalues are arranged in descending order.
22 Principle Component Analysis
![Page 23: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/23.jpg)
Assumptions 1. Linearity
2. Correlation among the variables
3. Large variance have more important dynamics
4. Sample size: 150+ cases.
5. All outliers should be removed
6. Components are uncorrelated
23 Principle Component Analysis
![Page 24: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/24.jpg)
PCA BY R
24
![Page 25: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/25.jpg)
Data Transcription level of 15 genes in 10 different cells.
Gene cell 1 cell 2 cell 3 cell 4 cell 5 cell 6 cell 7 cell 8 cell 9 cell 10
a 12 8 12 8 20 8 8 20 8 24 b 28 28 28 28 0 8 16 12 20 16 c 16 16 16 12 16 16 16 16 16 12 d 20 20 20 20 8 20 20 8 24 8 e 28 24 24 24 4 12 8 20 12 8 f 4 32 12 12 28 0 8 16 16 4 g 18 12 18 12 30 12 12 30 12 36 h 42 42 42 42 0 12 24 18 30 24 i 24 24 24 18 24 24 24 24 24 18 j 30 30 30 30 12 30 30 12 36 12 k 8 12 8 8 20 0 4 28 4 20 l 7 7 7 7 0 2 4 3 5 4
m 8 8 8 6 8 8 8 8 8 6 n 15 15 15 15 12 30 30 12 36 12 o 21 21 21 21 0 6 12 9 15 12
25 Principle Component Analysis R
![Page 26: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/26.jpg)
Data in R 26 Principle Component Analysis R
![Page 27: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/27.jpg)
Correlation among cells 27 Principle Component Analysis R
![Page 28: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/28.jpg)
PCA summary 28 Principle Component Analysis R
![Page 29: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/29.jpg)
Scree plot 29 Principle Component Analysis R
![Page 30: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/30.jpg)
Graphical representation PC2 vs PC1 30 Principle Component Analysis R
![Page 31: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/31.jpg)
Loadings by different components on each cell
31 Principle Component Analysis R
![Page 32: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/32.jpg)
Scores of genes 32 Principle Component Analysis R
![Page 33: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/33.jpg)
PCR IN SAS
33
![Page 34: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/34.jpg)
Data with code in SAS 34 Principle Component Analysis SAS
![Page 35: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/35.jpg)
Correlation among cells
35 Principle Component Analysis SAS
![Page 36: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/36.jpg)
PCA summary 36 Principle Component Analysis SAS
![Page 37: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/37.jpg)
cell1
cell2
cell3
cell4
cell5
cell6 cell7
cell8
cell9
cell10
-0.2
0
0.2
0.4
0.6
0.8
1
-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2
Com
p2
Comp1
PCA of gene transcription by different cells
Loadings by different components on each cell
37 Principle Component Analysis SAS
![Page 38: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/38.jpg)
Orthogonal rotation
38 Principle Component Analysis SAS
![Page 39: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/39.jpg)
Limitation of PCA • Requires numeric data for analysis
• 150+ data needed to get a representative factor trend.
• Loss of information due to dimension reduction
• Analysis is non conclusive. Needs explanatory factor analysis or cluster analysis to explain overall trend.
39 Principle Component Analysis
![Page 40: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/40.jpg)
CLUSTER ANALYSIS
40
![Page 41: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/41.jpg)
Cluster Analysis • An unsupervised learning tool
• It breaks down a large data set into smaller groups (i.e. clusters) where observations within a group are more similar than observations from other groups.
41 Cluster Analysis
![Page 42: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/42.jpg)
Algorithms • Hierarchical Cluster
• Non-Hierarchical Cluster (aka K-Means Cluster)
42 Cluster Analysis
![Page 43: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/43.jpg)
Euclidian Distance • Straight line distance between two points
43 Cluster Analysis
![Page 44: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/44.jpg)
Euclidian Distance
X axis
Y ax
is
44 Cluster Analysis
![Page 45: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/45.jpg)
Euclidian Distance
X axis
Y ax
is
45 Cluster Analysis
![Page 46: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/46.jpg)
Clustering
X axis
Y ax
is
46 Cluster Analysis
![Page 47: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/47.jpg)
GENERAL ASSUMPTIONS
47
![Page 48: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/48.jpg)
General Assumptions • Components (X & Y axis) are uncorrelated
• Some relationship among variables
• On a graph, points that are closer together share more similarities than points that are farther apart
• Large variances have more important dynamics in defining clusters
• Data is normalized/standardized • Euclidean Distance (straight line distance between 2 points)
assumes all parameters have the same scale for fair comparison between them
48 Cluster Analysis
![Page 49: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/49.jpg)
General Assumptions
X axis
Y ax
is
49 Cluster Analysis
![Page 50: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/50.jpg)
Normalization/Standardization Gene cell 1 cell 2 cell 3 cell 4 cell 5 cell 6 cell 7 cell 8 cell 9 cell 10
a 12 8 12 8 20 8 8 20 8 24 Size 4789 2334 1566 4678 2346 9654 2345 3567 1245 2366
Grow 0.2 0.05 0.08 0.13 0.67 0.23 0.05 0.76 0.08 0.23 # MITO 20 20 20 20 8 20 20 8 24 8
50
• Normalization scales all numeric variables in the range [0,1]
• Standardization transforms data to have zero mean and unit variance [-1,+1]
Cluster Analysis
![Page 51: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/51.jpg)
Cell Example Raw Data Gene cell 1 cell 2 cell 3 cell 4 cell 5 cell 6 cell 7 cell 8 cell 9 cell 10
a 12 8 12 8 20 8 8 20 8 24 b 28 28 28 28 0 8 16 12 20 16 c 16 16 16 12 16 16 16 16 16 12 d 20 20 20 20 8 20 20 8 24 8 e 28 24 24 24 4 12 8 20 12 8 f 4 32 12 12 28 0 8 16 16 4 g 18 12 18 12 30 12 12 30 12 36 h 42 42 42 42 0 12 24 18 30 24 i 24 24 24 18 24 24 24 24 24 18 j 30 30 30 30 12 30 30 12 36 12 k 8 12 8 8 20 0 4 28 4 20 l 7 7 7 7 0 2 4 3 5 4
m 8 8 8 6 8 8 8 8 8 6 n 15 15 15 15 12 30 30 12 36 12 o 21 21 21 21 0 6 12 9 15 12
51 Cluster Analysis
![Page 52: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/52.jpg)
Cell Example PCA Results
52
Cells Factor 1 Factor 2
cell1 0.93549 0.07226
cell2 0.81307 -0.0181
cell3 0.96315 0.0835
cell4 0.95191 -0.0752
cell5 -0.33566 0.75692
cell6 0.60245 0.0404
cell7 0.8073 0.0129
cell8 -0.01531 0.95305
cell9 0.8369 -0.07732
cell10 0.18234 0.85819
Cluster Analysis
![Page 53: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/53.jpg)
PCA and Cluster Analysis Gene cell 1 cell 2 cell 3 cell 4 cell 5 cell 6 cell 7 cell 8 cell 9 cell 10
a 12 8 12 8 20 8 8 20 8 24 b 28 28 28 28 0 8 16 12 20 16 c 16 16 16 12 16 16 16 16 16 12 d 20 20 20 20 8 20 20 8 24 8 e 28 24 24 24 4 12 8 20 12 8 f 4 32 12 12 28 0 8 16 16 4 g 18 12 18 12 30 12 12 30 12 36 h 42 42 42 42 0 12 24 18 30 24 i 24 24 24 18 24 24 24 24 24 18 j 30 30 30 30 12 30 30 12 36 12 k 8 12 8 8 20 0 4 28 4 20 l 7 7 7 7 0 2 4 3 5 4
m 8 8 8 6 8 8 8 8 8 6 n 15 15 15 15 12 30 30 12 36 12 o 21 21 21 21 0 6 12 9 15 12
Cells Factor 1 Factor 2 cell1 0.93549 0.07226 cell2 0.81307 -0.0181 cell3 0.96315 0.0835 cell4 0.95191 -0.0752 cell5 -0.33566 0.75692 cell6 0.60245 0.0404 cell7 0.8073 0.0129 cell8 -0.01531 0.95305 cell9 0.8369 -0.07732 cell10 0.18234 0.85819
X axis
Y ax
is
PCA
Cluster Analysis
53
Cell Example
Cluster Analysis
![Page 54: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/54.jpg)
HIERARCHICAL CLUSTER
54
![Page 55: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/55.jpg)
How many clusters?
55 Hierarchical Cluster Analysis
![Page 56: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/56.jpg)
Hierarchical Cluster • A series of steps that build a tree-like structure by either adding elements (i.e. agglomerative) to form a large cluster or by subtracting elements (i.e. divisive) from a large cluster to form smaller clusters
• Dendogram is used to visualize the results
56 Hierarchical Cluster Analysis
![Page 57: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/57.jpg)
Agglomerative Clustering
Single Linkage
Complete Linkage
Average Linkage
Centroid Linkage
Ward’s Linkage
57 Hierarchical Cluster Analysis
![Page 58: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/58.jpg)
Single Linkage
D (distance); c1, c2 (clusters); x1, y2 (distance between two elements) http://bit.ly/s-link
- Distance between closest elements in cluster - Produces long chains a b c … z
58 Hierarchical Cluster Analysis
![Page 59: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/59.jpg)
Single Linkage
D (distance); c1, c2 (clusters); x1, y2 (distance between two elements)
Complete Linkage
- Distance between closest elements in cluster - Produces long chains a b c … z
- Distance between farthest elements in clusters - Forces “spherical” clusters with consistent diameter
59 Hierarchical Cluster Analysis
![Page 60: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/60.jpg)
Single Linkage
D (distance); c1, c2 (clusters); x1, y2 (distance between two elements)
Complete Linkage
Average Linkage
- Distance between closest elements in cluster - Produces long chains a b c … z
- Distance between farthest elements in clusters - Forces “spherical” clusters with consistent diameter
- Average of all pairwise distances - Less affected by outliers
60 Hierarchical Cluster Analysis
![Page 61: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/61.jpg)
Single Linkage
D (distance); c1, c2 (clusters); x1, y2 (distance between two elements)
Complete Linkage
Average Linkage
Centroid Linkage
- Distance between closest elements in cluster - Produces long chains a b c … z
- Distance between farthest elements in clusters - Forces “spherical” clusters with consistent diameter
- Average of all pairwise distances - Less affected by outliers
- Distance between centroids (means) of two clusters - Requires numerical data
61 Hierarchical Cluster Analysis
![Page 62: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/62.jpg)
Single Linkage
D (distance); c1, c2 (clusters); x1, y2 (distance between two elements)
Complete Linkage
Average Linkage
Centroid Linkage
Ward’s Linkage
- Distance between closest elements in cluster - Produces long chains a b c … z
- Distance between farthest elements in clusters - Forces “spherical” clusters with consistent diameter
- Average of all pairwise distances - Less affected by outliers
- Distance between centroids (means) of two clusters - Requires numerical data
- Consider joining two clusters - Requires numerical data
62 Hierarchical Cluster Analysis
![Page 63: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/63.jpg)
Single Linkage Example
X axis
Y ax
is
63
Item X Y a b c d e f g h i j
Etc.
Hierarchical Cluster Analysis
![Page 64: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/64.jpg)
Single Linkage Example
X axis
Y ax
is
64 Hierarchical Cluster Analysis
![Page 65: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/65.jpg)
Single Linkage Example
X axis
Y ax
is
a b
c
d e
f g
h i
j k
l
m n
o p
q
r
a b c e d f g i h j k l m n p o r q
65 Hierarchical Cluster Analysis
![Page 66: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/66.jpg)
Single Linkage Example
X axis
Y ax
is
a b
c
d e
f g
h i
j k
l
m n
o p
q
r
a b c e d f g i h j k l m n p o r q
66 Hierarchical Cluster Analysis
![Page 67: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/67.jpg)
Single Linkage Example
X axis
Y ax
is
a b
c
d e
f g
h i
j k
l
m n
o p
q
r
a b c e d f g i h j k l m n p o r q
67 Hierarchical Cluster Analysis
![Page 68: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/68.jpg)
Single Linkage Example
X axis
Y ax
is
a b
c
d e
f g
h i
j k
l
m n
o p
q
r
a b c e d f g i h j k l m n p o r q
68 Hierarchical Cluster Analysis
![Page 69: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/69.jpg)
Single Linkage Example
X axis
Y ax
is
a b
c
d e
f g
h i
j k
l
m n
o p
q
r
a b c e d f g i h j k l m n p o r q
69 Hierarchical Cluster Analysis
![Page 70: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/70.jpg)
Single Linkage Example
X axis
Y ax
is
a b
c
d e
f g
h i
j k
l
m n
o p
q
r
a b c e d f g i h j k l m n p o r q
70
How many clusters do we have?
Hierarchical Cluster Analysis
![Page 71: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/71.jpg)
Single Linkage Example
X axis
Y ax
is
a b
c
d e
f g
h i
j k
l
m n
o p
q
r
a b c e d f g i h j k l m n p o r q
71 Hierarchical Cluster Analysis
![Page 72: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/72.jpg)
Single Linkage Example
X axis
Y ax
is
a b
c
d e
f g
h i
j k
l
m n
o p
q
r
a b c e d f g i h j k l m n p o r q
72 Hierarchical Cluster Analysis
![Page 73: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/73.jpg)
Single Linkage Example
X axis
Y ax
is
a b
c
d e
f g
h i
j k
l
m n
o p
q
r
a b c e d f g i h j k l m n p o r q
73 Hierarchical Cluster Analysis
![Page 74: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/74.jpg)
Limitations of Hierarchical Clustering • Single, Complete and Average Linkage can use numerical or categorical data as long as the distance is defined
• Centroid and Ward’s Linkage requires numerical (i.e. interval or ratio) data since the formula uses means
74 Hierarchical Cluster Analysis
![Page 75: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/75.jpg)
Limitations of Hierarchical Clustering • Underlying structure of the sample is unknown which makes it difficult to select the “correct” algorithm
• Poor cluster assignments cannot be modified
• Unstable solutions with a small sample (need at least 150 observations)
75 Hierarchical Cluster Analysis
![Page 76: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/76.jpg)
Limitations of Hierarchical Clustering • Outliers can affect clustering
• Single and Complete Linkage outliers can merge the wrong clusters
• Average Linkage is less affected by outliers because it computes average distances
• Centroid Linkage produces irregular shaped clusters where outliers influence the position of the centroid
• Ward’s Linkage tends to produce clusters with similar number of observations which makes it easy for outliers to distort results
76 Hierarchical Cluster Analysis
![Page 77: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/77.jpg)
HIERARCHICAL CLUSTER IN R
77
![Page 78: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/78.jpg)
PCA Results
Cells Factor 1 Factor 2 cell1 0.93549 0.07226 cell2 0.81307 -0.0181 cell3 0.96315 0.0835 cell4 0.95191 -0.0752 cell5 -0.33566 0.75692 cell6 0.60245 0.0404 cell7 0.8073 0.0129 cell8 -0.01531 0.95305 cell9 0.8369 -0.07732 cell10 0.18234 0.85819
78 Hierarchical Cluster Analysis R
![Page 79: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/79.jpg)
Open File
79 Hierarchical Cluster Analysis R
![Page 80: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/80.jpg)
Graph Data
80 Hierarchical Cluster Analysis R
![Page 81: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/81.jpg)
Graph Data
81 Hierarchical Cluster Analysis R
![Page 82: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/82.jpg)
Graph Data
82 Hierarchical Cluster Analysis R
![Page 83: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/83.jpg)
Standardize Data • Subtract first column to have quantitative data
83 Hierarchical Cluster Analysis R
![Page 84: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/84.jpg)
Standardize • Subtract mean and divide by standard deviation
84 Hierarchical Cluster Analysis R
![Page 85: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/85.jpg)
Euclidian Distance • Measures the distance between all the points
85 Hierarchical Cluster Analysis R
![Page 86: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/86.jpg)
Euclidian Distance • Measures the distance between all the points
86 Hierarchical Cluster Analysis R
![Page 87: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/87.jpg)
Hierarchical Clustering (complete)
87 Hierarchical Cluster Analysis R
![Page 88: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/88.jpg)
Hierarchical Clustering (complete)
88 Hierarchical Cluster Analysis R
![Page 89: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/89.jpg)
Hierarchical Clustering (complete)
89 Hierarchical Cluster Analysis R
![Page 90: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/90.jpg)
Hierarchical Clustering (complete)
90 Hierarchical Cluster Analysis R
![Page 91: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/91.jpg)
Hierarchical Clustering (average)
91 Hierarchical Cluster Analysis R
![Page 92: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/92.jpg)
Hierarchical Clustering (average)
92 Hierarchical Cluster Analysis R
![Page 93: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/93.jpg)
Cluster Mean
93 Hierarchical Cluster Analysis R
![Page 94: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/94.jpg)
Silhouette Plot
94 Hierarchical Cluster Analysis R
![Page 95: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/95.jpg)
Optimal Number of Clusters
95 Hierarchical Cluster Analysis R
![Page 96: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/96.jpg)
HIERARCHICAL CLUSTER IN SAS
96
![Page 97: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/97.jpg)
Import Data
97 Hierarchical Cluster Analysis SAS
![Page 98: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/98.jpg)
Hierarchical Cluster (centroid)
98 Hierarchical Cluster Analysis SAS
![Page 99: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/99.jpg)
Hierarchical Cluster (centroid)
99 Hierarchical Cluster Analysis SAS
![Page 100: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/100.jpg)
Hierarchical Cluster (centroid)
100 Hierarchical Cluster Analysis SAS
![Page 101: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/101.jpg)
Hierarchical Cluster (centroid)
101 Hierarchical Cluster Analysis SAS
![Page 102: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/102.jpg)
Hierarchical Cluster (centroid)
102 Hierarchical Cluster Analysis SAS
![Page 103: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/103.jpg)
K-MEANS CLUSTER
103
![Page 104: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/104.jpg)
K-Means Cluster • Most widely used for extra large data
• Observations can switch cluster membership
• Less impacted by outliers
• Multiple passes through the data allows the final solution to optimize within cluster homogeneity and between cluster heterogeneity
• Algorithm breaks the data into K clusters
• K is fixed
104 K-Means Cluster Analysis
![Page 105: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/105.jpg)
K-Means Cluster
105
Item X Y a
b
c
d
e
f
g
h
i
j
Etc.
K-Means Cluster Analysis
![Page 106: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/106.jpg)
K-Means Cluster
X axis
Y ax
is
106 K-Means Cluster Analysis
![Page 107: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/107.jpg)
K-Means Cluster
X axis
Y ax
is
107 K-Means Cluster Analysis
![Page 108: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/108.jpg)
K-Means Cluster
X axis
Y ax
is
108 K-Means Cluster Analysis
![Page 109: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/109.jpg)
K-Means Cluster
X axis
Y ax
is
109 K-Means Cluster Analysis
![Page 110: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/110.jpg)
K-Means Cluster
X axis
Y ax
is
110 K-Means Cluster Analysis
![Page 111: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/111.jpg)
K-Means Cluster
X axis
Y ax
is
111 K-Means Cluster Analysis
![Page 112: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/112.jpg)
K-Means Cluster
X axis
Y ax
is
a
b
c
d e
f
g
h i
j k
l
m
n
o
p
q
r
112 K-Means Cluster Analysis
![Page 113: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/113.jpg)
K-Means Cluster
X axis
Y ax
is
113 K-Means Cluster Analysis
![Page 114: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/114.jpg)
K-Means Cluster
X axis
Y ax
is
114 K-Means Cluster Analysis
![Page 115: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/115.jpg)
K-Means Cluster
X axis
Y ax
is
115 K-Means Cluster Analysis
![Page 116: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/116.jpg)
K-Means Cluster
X axis
Y ax
is
116 K-Means Cluster Analysis
![Page 117: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/117.jpg)
K-Means Cluster
X axis
Y ax
is
117 K-Means Cluster Analysis
![Page 118: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/118.jpg)
K-Means Cluster
X axis
Y ax
is
118 K-Means Cluster Analysis
![Page 119: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/119.jpg)
K-Means Cluster
X axis
Y ax
is
a
b
c
d e
f
g
h i
j k
l
m
n
o
p
q
r
119 K-Means Cluster Analysis
![Page 120: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/120.jpg)
Limitations of K-Means Clustering • Underlying structure of the sample is unknown which makes it difficult to determine the number of clusters (K) needed in advance
• Poor cluster assignments cannot be modified
• Unstable solutions with a small sample (need at least 150 observations)
• Forces clusters to be round
• Outliers can distort clusters
120 K-Means Cluster Analysis
![Page 121: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/121.jpg)
K-MEANS CLUSTER IN R
121
![Page 122: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/122.jpg)
Cell Example
Cells Factor 1 Factor 2 cell1 0.93549 0.07226 cell2 0.81307 -0.0181 cell3 0.96315 0.0835 cell4 0.95191 -0.0752 cell5 -0.33566 0.75692 cell6 0.60245 0.0404 cell7 0.8073 0.0129 cell8 -0.01531 0.95305 cell9 0.8369 -0.07732 cell10 0.18234 0.85819
122 K-Means Cluster Analysis R
![Page 123: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/123.jpg)
K-Means Cluster
123 K-Means Cluster Analysis R
![Page 124: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/124.jpg)
K-Means Cluster
124 K-Means Cluster Analysis R
![Page 125: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/125.jpg)
Remember? Hierarchical Cluster (centroid)
125 K-Means Cluster Analysis R
![Page 126: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/126.jpg)
K-Means Cluster
126 K-Means Cluster Analysis R
![Page 127: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/127.jpg)
K-MEANS CLUSTER IN SAS
127
![Page 128: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/128.jpg)
K-Means Cluster
128 K-Means Cluster Analysis SAS
![Page 129: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/129.jpg)
K-Means Cluster
129 K-Means Cluster Analysis SAS
![Page 130: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/130.jpg)
GENERAL LIMITATIONS
130
![Page 131: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/131.jpg)
General Limitations • No test statistic available to validate the significance of the result
• Cluster dimensions are often randomly chosen and may not reflect real conditions can be a statistical artifact
• Cluster analysis is powerful enough that it will provide a cluster even if no meaningful groups are embedded in the sample
131 Cluster Analysis
![Page 132: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/132.jpg)
General Limitations • Choosing the variables used to group observations is the most important and different approaches may lead to different clusters • How to select the variable
• Whether or not to standardize/normalize data
• How to address multicollinearity use PCA
• High correlation among variables can be an issue because it may overweight other important variables
• PCA is also controversial since low eigenvalues are dropped which may exclude factors that represent unique and important information
132 Cluster Analysis
![Page 133: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/133.jpg)
Best Practice • Use Hierarchical first to determine the optimal number of clusters followed by K-Means Clustering to optimize the shape of the clusters
133 Cluster Analysis
![Page 134: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/134.jpg)
The End
134
![Page 135: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/135.jpg)
135
• A good example of PCA and Cell Clustering can be seen in this paper:
• Pollen et al. (2014). Low-coverage single cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex. Nature. Beiotech. 32:1053–1058
• doi:10.1038/nbt.2967
![Page 136: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/136.jpg)
![Page 137: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/137.jpg)
cell5 cell8
cell10
a
b
c d
e
f
g
h
i j
k
l
m
n
o
-6
-4
-2
0
2
4
6
8
-4 -3 -2 -1 0 1 2 3 4 5 6
PC2
20.5
%
PC 1(56%)
PCA of gene transcription by different kinds of cells
cell obs
genes
![Page 138: MULTIVARIATE STATISTICS - York University › shore › PCA and Cluster_Final 2017.pdfMULTIVARIATE STATISTICS Principal Component Analysis and Cluster Analysis November 6 th, 2017](https://reader033.vdocuments.net/reader033/viewer/2022052800/5f0f243d7e708231d442b180/html5/thumbnails/138.jpg)
Divisive Clustering
Monotheic
Polythetic
138