descriptive analysis and pca
DESCRIPTION
Descriptive Analysis and PCA. Dominique Valentin ENSBANA/CESG [email protected]. Hervé Abdi The university of Texas at Dallas [email protected]. Back to the yogurt example. Texture Thickness: consistency of the mass in the mouth - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/1.jpg)
Descriptive Analysisand PCA
Hervé AbdiThe university of Texas at Dallas
Dominique ValentinENSBANA/CESG
![Page 2: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/2.jpg)
Back to the yogurt example
Texture Thickness: consistency of the mass in the mouthRate of Melt: amount of product melted after a certain pressure of the tongueGraininess: amount of particle in massMouth coating: amount of film left on the mouth surfaces
Basic tastesSweet: SucroseSour: lactic acidBitter: caffeineSalty: sodium chloride
ArômeWater: taste like water down Flour: 1 spoon of flavor mixed in waterWood: cutting from pencil sharpening Chalk: smectaMilk: whole milk Raw pie crust: commercial raw pie crustCream: crème fraiche Hazelnut: : hazelnut powderearthy: earth Mushroom: dry mushrooms soaked in water
![Page 3: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/3.jpg)
9 panélistes
5 yogurts: 2 cow milk yogurts 3 soy yogurts
Pas du tout Très
Amer
Pas du tout TrèsSalé
Pas du toutAstringent
Back to the yogurt example
![Page 4: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/4.jpg)
TextureFarineux - Flour
0,00
2,00
4,00
6,00
8,00
10,00
sojacarrefour
sojasun sojade veloutédanone
leaderprice
Inte
nsi
té m
oye
nn
e
ab ab ab b
leaderprice
Épais – thickness
0,00
2,00
4,00
6,00
8,00
10,00
sojacarrefour
sojasun sojade veloutédanone
Inte
nsi
té m
oye
nn
e
bc bca
abd
Gras – Mouth coating
0,00
2,00
4,00
6,00
8,00
10,00
sojacarrefour
sojasun sojade veloutédanone
leaderprice
Inte
nsi
té m
oye
nn
e
b abab a
ab
Fondant - melt
0,00
2,00
4,00
6,00
8,00
10,00
sojacarrefour
sojasun sojade veloutédanone
leaderprice
Inte
nsi
té m
oye
nn
e
abc abcc
abcab
Back to the yogurt example
![Page 5: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/5.jpg)
astringent
0,00
2,00
4,00
6,00
8,00
10,00
sojacarrefour
sojasun sojade veloutédanone
leaderprice
Inte
nsi
té m
oye
nn
e
Taste
Sucré - Sweet
0,00
2,00
4,00
6,00
8,00
10,00
sojacarrefour
sojasun sojade veloutédanone
leaderprice
Inte
nsi
té m
oye
nn
e
ab ab abab ab
leaderprice
Acide - Sour
0,00
2,00
4,00
6,00
8,00
10,00
sojacarrefour
sojasun sojade veloutédanone
Inte
nsi
té m
oye
nn
e
cd cd cd bca
Amer - Bitter
0,00
2,00
4,00
6,00
8,00
10,00
sojacarrefour
sojasun sojade veloutédanone
leaderprice
Inte
nsi
té m
oye
nn
e
aa a a a
a abc abcc
abc
Back to the yogurt example
![Page 6: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/6.jpg)
AromaFarine - flour
0,00
2,00
4,00
6,00
8,00
10,00
sojacarrefour
sojasun sojade veloutédanone
leaderprice
Inte
nsi
té m
oye
nn
e
Craie - chalk
0,00
2,00
4,00
6,00
8,00
10,00
sojacarrefour
sojasun sojade veloutédanone
leaderprice
Inte
nsi
té m
oye
nn
e
Crème - cream
0,00
2,00
4,00
6,00
8,00
10,00
sojacarrefour
sojasun sojade veloutédanone
leaderprice
Inte
nsi
té m
oye
nn
e
cabc
d
abc
d
ab b
b b
Noisette - Hazelnut
0,00
2,00
4,00
6,00
8,00
10,00
sojacarrefour
sojasun sojade veloutédanone
leaderprice
Inte
nsi
té m
oye
nn
e
aba
ab ab b c cc
ac
Back to the yogurt example
![Page 7: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/7.jpg)
-0.8 -0.4 0 0.4 0.8
-0.8
-0.4
0
0.4
0.8
Facteur 1 - 61.04 %
Facteur 2 - 17.84 %
farineux
epais
gras
fondant
sucre
acide
astringent
eau
farine
bois
craie
lait
creme
noisette
terreuxchampignon
-4.5 -3.0 -1.5 0 1.5 3.0
-2
-1
0
1
2
Facteur 1 - 61.04 %
Facteur 2 - 17.84 %
soja bio
soja champion
Soja leaderpriceSoja carrefour
Soja bifidus
Soja sun
sojade
Soja délice
carrefour
velouté danone
danone bifidus
Leader price
A solution: Principal Component Analysis
![Page 8: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/8.jpg)
A statistical technique used to transform a number of correlated variables into a smaller number of uncorrelated variables called principal components.
The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible
The mathematical technique used in PCA is called eigen analysis
What is PCA ?
![Page 9: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/9.jpg)
When to use PCA ?
To analyze 2 dimensional data tables describing I observations with J quantitative variables
1 … j … J
1...i...I
yij…...
……
...
Variables
Obs
erva
tions
![Page 10: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/10.jpg)
Why using PCA ?
1.To evaluate the similarity between the observations, here the products
2. to detect structure in the relationships between variables, here the descriptors
3. to reduce the number of variables to allow for a graphical representation of the data
To give a synthetic description of the products
![Page 11: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/11.jpg)
General principle of PCA
1 … j … J1
...i
...I
yij…...
……
...
VariablesO
bse
rva
tions
PC1 .. PCk .. PCK
1
...i
...I
Cpik…...
……
...
Principal components
Diagonalizationor eigen analysis
Cp1
PC2
PC1
PC2
Circle of correlations Projection of observations
++ ++
![Page 12: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/12.jpg)
A baby example: wine profile
AmberBlackcurrent Coconut Leather Musc
Goose berry Woody Vanilla Rasberry
v1 7.000 3.000 1.000 6.000 9.000 3.000 1.000 0.000 2.000
v2 0.000 5.000 1.000 1.000 0.000 7.000 0.000 1.000 6.000
v3 1.000 9.000 0.000 0.000 0.000 6.000 1.000 1.000 5.000
v4 1.000 6.000 7.000 0.000 1.000 6.000 4.000 6.000 4.000
v5 6.000 1.000 8.000 5.000 4.000 2.000 5.000 5.000 1.000
v6 1.000 6.000 5.000 1.000 0.000 5.000 5.000 7.000 6.000
v7 7.000 3.000 1.000 6.000 8.000 2.000 1.000 0.000 2.000
v8 6.000 3.000 0.000 5.000 5.000 3.000 1.000 1.000 3.000
v9 0.000 4.000 4.000 1.000 0.000 7.000 6.000 5.000 5.000
v10 4.000 2.000 6.000 5.000 6.000 2.000 5.000 7.000 1.000
v11 5.000 1.000 4.000 6.000 7.000 1.000 6.000 7.000 2.000
v12 1.000 6.000 0.000 1.000 0.000 5.000 0.000 1.000 8.000
![Page 13: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/13.jpg)
A baby example: wine profile
![Page 14: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/14.jpg)
A baby example: wine profile
![Page 15: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/15.jpg)
How to find the principal components?
Step 1: get some data
Step 2: subtract the means of the variables
Step 3: find the eigenvectors and eigenvalues of the covariance matrix
Step 4: find the principal components by projecting the observations onto the eigenvectors
Step 5: compute the loading as the correlation between the original variables and the principal components
![Page 16: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/16.jpg)
A 2D example: step 1 get the data
20 words :
Variable 1 = number of letters
Variable 2 = number of lines used to define the words in the dictionary.
![Page 17: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/17.jpg)
A 2D example: step 1 get the data
![Page 18: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/18.jpg)
A 2D example: step 2 subtract the mean
Y = “length of words ” MY = 6y = (Y −MY)
W = “number of lines of the definition” MW = 8w = (W −MW)
![Page 19: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/19.jpg)
A 2D example: step 2 subtract the mean
![Page 20: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/20.jpg)
A 2D example: step 3 find the eigenvectors
![Page 21: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/21.jpg)
A 2D example: step 3 find the eigenvectors
![Page 22: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/22.jpg)
A 2D example: project the observations
![Page 23: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/23.jpg)
A 2D example: project the observations
![Page 24: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/24.jpg)
A 2D example: compute the loadings
r (W, F1) = 0.97
Pearson correlation coefficient
![Page 25: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/25.jpg)
A 2D example: compute the loadings
r (W, F2) = 0.23
Pearson correlation coefficient
![Page 26: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/26.jpg)
A 2D example: compute the loadings
r (Y, F1) = -0.87
Pearson correlation coefficient
![Page 27: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/27.jpg)
A 2D example: compute the loadings
r (Y, F2) = 0.50
Pearson correlation coefficient
![Page 28: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/28.jpg)
A 2D example: draw the circle of correlation
r (W, F1) = 0.97
r (W, F2) = 0.23
r (Y, F1) = -0.87
r (Y, F2) = 0.50
![Page 29: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/29.jpg)
How to compute the explained variance ?
Eigenvalue % variance Cumulated % variance
392 88 88 52 12 100 444
392
444X 100 = 88%
![Page 30: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/30.jpg)
How many components to keep
The Kaiser criterion. retain only composante with eigenvalues greater than 1.
The scree test.
Common sens. Keep dimensions that are interpretable.
Examines several solutions and chooses the one that makes the best "sense."
0
0,5
1
1,5
2
2,5
3
3,5
4
1 2 3 4 5 6 7 8
![Page 31: Descriptive Analysis and PCA](https://reader036.vdocuments.net/reader036/viewer/2022062409/568151b6550346895dbfe34a/html5/thumbnails/31.jpg)
Should I normalize the data
Yes if they are not measured on the same scale
Otherwise it depends:
Normalized: same weight for all variables Not normalized: weight proportional to standard deviation