![Page 1: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/1.jpg)
PCA for analysis of complex multivariate data
![Page 2: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/2.jpg)
Interpretation of large data tables by PCA
• In industry, research and finance the amount of data is often very large
• Little information is available a priori
• There is a need for methods based on few assumptions and which can give a simple and easily understandable overview
– Overall broad interpretation
– Ideas for further analyses
– Generating hypotheses
• PCA is such a method!!!!
![Page 3: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/3.jpg)
PCA used for
• Interpretation
• Pre-processing for regression
• Classification
• SPC
• Noise reduction
• Pre-processing for other statistical analyses
![Page 4: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/4.jpg)
Examples of use in industry
• Process monitoring
• Sensory analysis (tasting etc.)– Product development and quality control
• Rheological measurements
• Process prediction
• Spectroscopy (NIR and other)
![Page 5: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/5.jpg)
Examples of use outside industry
• Psychology
• Food science
• Information retrieval systems
• Consumer studies, marketing
![Page 6: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/6.jpg)
PCA
1. Compresses the information– Finds the directions with most variability– Projects the information down on these dimensions
2. Presents the information in simple plots– Scores plot
• Projection of data onto subspace
– Loadings plot• Plot of relation between original variables and subspace
dimensions
![Page 7: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/7.jpg)
NKN
K
xx
x
xxx
X
..
.
.
.
.
.
.
..
1
21
11211
Data structure for PCA, data matrix
Rows are objects, ”samples”Columns are variables
![Page 8: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/8.jpg)
Scatter plots, vectors
• Vector x=( x1,x2,…xK)
• Can be plotted. If several vectors are plotted it is called a scatter plot
![Page 9: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/9.jpg)
X=(x1,x2,x3)
x1x2
x3
![Page 10: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/10.jpg)
Principal component analysis
DataMatrix
X
Variables
Objects
PCAScores plot
Loadings plot
Other results
![Page 11: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/11.jpg)
X1X2
X3
X
PC 1
PC 2
![Page 12: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/12.jpg)
Model
X=TPT + E
The matrix X is modelled as components (systematic effects) plus residuals, E (noise)
PCA model
![Page 13: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/13.jpg)
The main plots
• Scores plot– For interpreting relations among samples
• Loadings plot– For interpreting relations among variables
• Explained variance plot
![Page 14: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/14.jpg)
PC1
PC2
Scores plot/projection (T)
t1
t2
70%
25%
![Page 15: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/15.jpg)
x1 pc1
pc2
Loadings plot
x2
x3
![Page 16: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/16.jpg)
Loadings plots
• Usually 2-dimensional
• For spectroscopy and other continuous measurements, 1-dimensional plots are used.
![Page 17: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/17.jpg)
Guidelines for how to interpret the plots
• Variables which are close have high correlation
• Samples which are close are similar
• Variables on opposite side of origin have negative correlation
• Objects on the right are dominated by variables to the right and so on….
![Page 18: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/18.jpg)
Variance pr. component
• Sum of the variances of the original x-variables is equal to the sum of the variances of the scores.
• We can talk about variance pr. component and explained variance (in %) pr. component
• Can be presented in a cumulative way (or not)
![Page 19: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/19.jpg)
Explained variance
No. of components1 2 3
50%
100%
Cumulative plot (in % or absolute units)
![Page 20: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/20.jpg)
1 2 3
Number of components
Explained variance
0.5
1.0
Non-cumulative plot (in % or absolute units)
Bar plots can also be used
![Page 21: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/21.jpg)
Sensory analysis of sausages Goals of the analysis
• Investigate the possibility of using dairy ingredients in sausages– Type and concentration– Focus on sensory properties
• Investigate the interaction of diary ingredients with other ingredients and process parameters
• Characterise the differences among the dairy ingredients used in sausages
![Page 22: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/22.jpg)
Sensory analysis of sausages• Factorial design in 4 variables
– 5 dairy ingredients• Na caseinate• Na caseinate (high viscosity)• Skim milk• Whey protein• Demineralised whey powder
– 3 concentration levels• 1%, 3% and 5%
– 2 starch levels• 2% and 4%
– 2 cooking temperatures• 76 and 82 degrees C.
Published: Baardseth et al, J. Food Science.
![Page 23: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/23.jpg)
Variables/attributes used• Graininess• Stickiness• Firmness• Juiciness• Fatness• Elasticity• Colour hue• Colour intensity• Whiteness• Meat taste• Off-taste• Rancidity• Smokiness
![Page 24: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/24.jpg)
70%
![Page 25: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/25.jpg)
Loadings and scores
Scores split up according to ingredient on next slide
![Page 26: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/26.jpg)
Above average
Below average
Demineralised whey powder
Na caseinate
Na caseinate (high viscosity)
Skim milk
Whey protein
Can also be done using colours
![Page 27: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/27.jpg)
We have got information about
• Which samples that are similar• Which variables that are similar or very different• Which samples that are characterised by which
variables• Which design variables that are most important for
variation• Differences among the ingredients
![Page 28: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/28.jpg)
Pre-processing
• If variables are in very different units, it may be advantageous to standardise the variables prior to PCA
• Xnew=Xold/std(X) for each variable
• Be aware of noise!! Can be tested by ANOVA or replicates.
![Page 29: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/29.jpg)
Standard deviations
Viscosity
pH
Water content
Temp
Variables of different typesDifficult to compare
![Page 30: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/30.jpg)
Pre-processing
• In spectroscopy usually not done
• Very important if measurements from different instruments are used together
![Page 31: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/31.jpg)
Outlier detection
• Outliers may always be present
• Influence the solution
• New information?
• Important to detect them
![Page 32: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/32.jpg)
Tools for outlier detection
• Residuals =
– Plot residuals pr. object
– Compute sum of squared residuals pr. object
• Leverage, distance to mean within space
(Mahalanobis distance)
'ˆˆ TPXXXE
![Page 33: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/33.jpg)
e
”normal samples”
PCA plane
Leverage point
x1 x2
x3
![Page 34: PCA for analysis of complex multivariate data](https://reader033.vdocuments.net/reader033/viewer/2022061614/56812ed3550346895d947391/html5/thumbnails/34.jpg)
Validation
• Plots, how natural is the solution: Relate to knowledge and design.
• Steep increase of explained variance
• Can also use cross-validation– Leave out one sample and test on the rest. Repeat for all samples.
Compute explained prediction variance.