![Page 1: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/1.jpg)
BIOSYST-MeBioS www.biw.kuleuven.be
The potential of Functional Data
Analysis for Chemometrics
Dirk De Becker, Wouter Saeys,
Bart De Ketelaere and Paul Darius
![Page 2: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/2.jpg)
BIO
SY
ST
-MeB
ioS
The Potential of FDA for Chemometrics
Introduction to FDA
Introduction to Chemometrics
Using FDA in chemometrics
For prediction
For Analysis Of Variance
Conclusions
![Page 3: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/3.jpg)
BIO
SY
ST
-MeB
ioS
What is Functional Data Analysis?
Developed by Ramsay & Silverman (1997)
Analyse Data
By approximating it
Using some kind of functional basis
Mainly for longitudinal data
High correlation between neighbouring datapoints
![Page 4: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/4.jpg)
BIO
SY
ST
-MeB
ioS
Why use FDA?
Data as single entity <-> individual observations
Make a function of your data
Derivatives
Reduce the amount of data
Noise -> smoothing
Impose some known properties on the data
Monotonicity, non-negativeness, smoothness, ...
![Page 5: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/5.jpg)
BIO
SY
ST
-MeB
ioS
Basis Functions?
Polynomials: 1, t, t², t³, ...
Fourier: 1, sin(ωt), cos(ωt), sin(2ωt),
cos(2ωt)
Splines
Wavelets
Depends on your data
![Page 6: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/6.jpg)
BIO
SY
ST
-MeB
ioS
Chemometrics
Measure optical properties of material
Transmission or reflection of light
At a large number of wavelengths
Use these properties to predict something else
![Page 7: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/7.jpg)
BIO
SY
ST
-MeB
ioS
Why Chemometrics?
Fast
Cheap
Non-destructive
Environment-friendly
![Page 8: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/8.jpg)
BIO
SY
ST
-MeB
ioS
Classical methods
Ignore correlation between neighbouring
wavelengths:
![Page 9: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/9.jpg)
BIO
SY
ST
-MeB
ioS
FDA in chemometrics
NIR spectra
Absorption peaks
Width and height
Basis: B-splines
~ shape of absorption peaks
Preserve the vicinity constraint
![Page 10: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/10.jpg)
BIO
SY
ST
-MeB
ioS
Spline Functions
Piecewise joining polynomials of order m
Fast evaluation
Continuity of derivatives
Up to order m-2
In L interior knots
Degrees of freedom: L + m
Flexible
![Page 11: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/11.jpg)
BIO
SY
ST
-MeB
ioS
![Page 12: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/12.jpg)
BIO
SY
ST
-MeB
ioS
Constructing a spline basis
Order
What to use the model for
Mostly cubic splines (order 4)
Number and position of knots
Use enough
Look at the data
!Overfitting
![Page 13: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/13.jpg)
BIO
SY
ST
-MeB
ioS
Position of knots
More variation -> more knots
0 500 1000 1500 2000
12
34
5
valu
es
54 knots, equally spaced
0 500 1000 1500 2000
12
34
5
valu
es
54 knots, tuned
![Page 14: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/14.jpg)
BIO
SY
ST
-MeB
ioS
B-spline approximation
![Page 15: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/15.jpg)
BIO
SY
ST
-MeB
ioS
FDA for prediction
Functional regression models
P-Spline Regression (Marx and Eilers)
Non-Parametric Functional Data Analysis
(Ferraty and Vieu)
![Page 16: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/16.jpg)
BIO
SY
ST
-MeB
ioS
Functional Regression Models
Project spectra to spline basis
Apply Multivariate Linear Regression to the spline
coefficients
Great reduction in system complexity
Natural shape of absorption peaks is used
![Page 17: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/17.jpg)
BIO
SY
ST
-MeB
ioS
Functional Regression Models: case study
420 samples of hog manure
Reflectance spectra
Total nitrogen (TN) and dry matter (DM) content
PLS and Functional Regression applied
![Page 18: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/18.jpg)
BIO
SY
ST
-MeB
ioS
Functional Regression: case study (ct'd)
![Page 19: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/19.jpg)
BIO
SY
ST
-MeB
ioS
Functional Regression: case study: results
FDA PLS # B-splines # lat varDataset 1 10,4069 10,3282 22 6Dataset 2 9,9084 10,565 20 6Dataset 3 10,4921 10,4857 22 6Dataset 4 10,4533 10,3236 22 6Dataset 5 9,1203 10,6019 23 6
Dry matter content
FDA PLS # B-splines # lat varDataset 1 1,1922 1,2603 25 6Dataset 2 1,1582 1,1826 25 6Dataset 3 1,1806 1,2325 25 6Dataset 4 1,253 1,2852 25 6Dataset 5 1,1562 1,2664 25 6
Total nitrogen content
![Page 20: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/20.jpg)
BIO
SY
ST
-MeB
ioS
P-Spline Regression (PSR)
By Marx and Eilers
Construct with B-splines:
Use roughness parameter on
Minimize
Full spectra are used for regression
BD
22 DXByS
![Page 21: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/21.jpg)
BIO
SY
ST
-MeB
ioS
P-Spline Regression: case study
121 samples of seed pills
y is % humidity
PLS: RMSEP = 1,19
PSR: RMSEP = 1,115
# B-spline coefficients = 7
λ = 0.001
![Page 22: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/22.jpg)
BIO
SY
ST
-MeB
ioS
Non-Parametric Functional Data Analysis
By F. Ferraty and P. Vieu
No regression model is involved
Prediction by applying local kernel functions in
function space
So far, no good results yet
![Page 23: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/23.jpg)
BIO
SY
ST
-MeB
ioS
FDA in Anova setting: FANOVA
ANOVA:
“Study the relation between a response variable and
one or more explanatory variables”
is overall mean
are the effects of belonging to a group g
are residuals
)()()()( iggigx
)()( g
)( ig
![Page 24: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/24.jpg)
BIO
SY
ST
-MeB
ioS
FANOVA: theory
Constraint:
Introduce so that
Introduce functional aspect:
Constraint: introduce
],[,0)( 1 mb
Z
Tg ],,,[ 1
)()()( Zx
)()( Cx )()( B
*** ,, xCZ
![Page 25: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/25.jpg)
BIO
SY
ST
-MeB
ioS
FANOVA: goal and solution
Goal: estimate from
Solution:
B C
**1**^
)( CZZZB T
![Page 26: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/26.jpg)
BIO
SY
ST
-MeB
ioS
FANOVA: significance testing
Locally:
Globally: ig
igig Zxerrordf
MSE 2^
)]()()([)(
1)(
)(/)(sup CMSEContrastM
![Page 27: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/27.jpg)
BIO
SY
ST
-MeB
ioS
FANOVA: case study
Spectra of manure
4 types of animals: dairy, beef, calf, hog
3 ambient temperatures: 4°C, 12°C, 20°C
3 sample temperatures: 4°C, 12°C, 20°C
9 replicates
=> 324 samples
Model: )()()()()( ijklkjiijkl SATI
]9,1[],3,1[],3,1[],4,1[ lkji
![Page 28: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/28.jpg)
BIO
SY
ST
-MeB
ioS
FANOVA: case study (ct'd)
![Page 29: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/29.jpg)
BIO
SY
ST
-MeB
ioS
FANOVA: case study (ct'd)
![Page 30: BIOSYST-MeBioS The potential of Functional Data Analysis for Chemometrics Dirk De Becker, Wouter Saeys, Bart De Ketelaere and Paul Darius](https://reader035.vdocuments.net/reader035/viewer/2022070400/56649f135503460f94c277ac/html5/thumbnails/30.jpg)
BIO
SY
ST
-MeB
ioS
Conclusions
Splines are a good basis for fitting spectral
data
Using FDA, it is possible to include vicinity
constraint in prediction models in
chemometrics
FANOVA is a good tool to explore the
variance in spectral data