data analysis in two-dimensional chromatography
TRANSCRIPT
Data analysis in two-dimensional chromatography
Data analysis in two-dimensional chromatography
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Gabriel Vivó TruyolsAnalytical-chemistry group
Van ‘t Hoff Institute for Molecular SciencesUniversity of Amsterdam
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
IntroductionFrom 1D chromatography to 2D chromatography: what does change?
2 4 6 8 10 12 14 16 18Time, min
0x100
4x106
8x106
1x107
Inte
nsity
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
The complexity of the data changesInstruments can be classified according to the order of the tensor of data used to represent a single experiment:
Zero-order instruments Produce Zero-order tensor
(e.g. a number) Example pH - meter
First-order instruments Produce First-order tensor
(e.g. a vector) Example UV-VIS spectrometer
Second-order instruments Produce Second-order tensor
(e.g. a matrix) Example LC-MS, GCxGC
Third-order instrument Produce Third-order tensor
(e.g. a “cube” of data) Example GCxGC-MS
nth-order instrument exist, but they are rare
The changes from 1D to 2D
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
The concept of “peak vicinity” changes
Two-dimensional
1tR2 t R
One-dimensional
tR
Peak of interest
Peak of interest
Neighbors Neighbors
S. Peters, G. Vivó-Truyols, P. Marriott, P.J. Schoenmakers, J. Chromatogr. A. 1146 (2007), 232-241.
The changes from 1D to 2D
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
The concept of “peak resolution” is different
A
B
One-dimensional
tR
Two-dimensional
1tR2 t R
f
g
Valley-to-peak ratio(between A and B) g
fP gfP
AB?
Valley-to-peak ratio(between A and B)?
S. Peters, G. Vivó-Truyols, P. Marriott, P.J. Schoenmakers, J. Chromatogr. A. 1146 (2007), 232-241.
The changes from 1D to 2D
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
The concept of “peak” changes
One-dimensional
tR
Two-dimensional
The changes from 1D to 2D
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
The concept of “peak” changes
One-dimensional
tR
Two-dimensional
The changes from 1D to 2D
GCxGC, Riva - 2014
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Do the main steps in data processing change?
Ste
p 2
Pre-process
Ste
p 3
MeasureS
tep
1View
• Base-line correction
• Noise filtering• Spike filtering• Alignment• … etc.
• Peak detection / integration
• Calibration• Deconvolution• Pattern
recognition
The changes from 1D to 2D
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Do the main steps in data processing change?
Ste
p 2
Pre-process
Ste
p 3
MeasureS
tep
1View
• Base-line correction
• Noise filtering• Spike filtering• Alignment• … etc.
• Peak detection / integration
• Calibration• Deconvolution• Pattern
recognition• Class separation
The changes from 1D to 2D
Basically, the steps are the same, but the algorithms used for each step may be different.
Univariate
Multivariate
• Folding• Phasing
First step: visualizationFirst step: visualization
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Ste
p 2
Pre-process
Ste
p 3
Measure
Ste
p 1
View
• Base-line correction
• Noise filtering• Spike filtering• Alignment• … etc.
• Peak detection / integration
• Calibration• Deconvolution• Pattern
recognition• Class separation
• Folding• Phasing
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
VisualisationRaw data in 2D chromatography
0 10 20 30Time, min
-101234
Abso
rban
ce, A
U
Chromatogram of Glycine preparation (254nm)(data courtesy of University of Valencia)
Two-dimensional chromatographyGCxGC chromatogram of diesel (FID detector)First dimension: non-polar; Second dimension: polar
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
“Folding” the chromatogram Visualisation
Mod time (8 seconds)
27002710
27202730
2740
02
46
80
2000
4000
6000
8000
10000
1tR, s2tR, s
FID
sig
nal
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
“Folding” the chromatogram Visualisation
27002720
2740
02
46
80
5000
10000
1tR, s2tR, s
FID
sig
nal
2700 2710 2720 2730 27400
1
2
3
4
5
6
7
8
1tR, s
2 t R, s
0
2000
4000
6000
8000
10000
Mod time (8 seconds)
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
“Folding” the chromatogram: phasing Visualisation
Mod time (8 seconds)
27002720
2740
46
8100
5000
10000
1tR, s2tR, s
FID
sig
nal
2700 2710 2720 2730 2740
3
4
5
6
7
8
9
10
1tR, s
2 t R, s
0
2000
4000
6000
8000
10000
Phase = 0.3
2700 2710 2720 2730 27404
5
6
7
8
9
10
11
12
1tR, s
2 t R, s
0
2000
4000
6000
8000
10000
27002720
2740
46
810
120
5000
10000
1tR, s2tR, s
FID
sig
nal
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
“Folding” the chromatogram: phasing Visualisation
Mod time (8 seconds)
Phase = 0.5
2700 2710 2720 2730 27406
7
8
9
10
11
12
13
14
1tR, s
2 t R, s
0
2000
4000
6000
8000
10000
27002720
2740
68
1012
140
5000
10000
1tR, s2tR, s
FID
sig
nal
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
“Folding” the chromatogram: phasing Visualisation
Mod time (8 seconds)
Phase = 0.75
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Cylindrical coordinates. An alternative way to represent the data.
Visualisation
J.J.A.M. Weusten, E.P.P.A. Derks , J.H.M. Mommers, S. van der Wal, Anal. Chim. Acta 726 (2012), 9
1tR
=2tR
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Interpolation Visualisation
27002720
2740
46
8100
5000
10000
1tR, s2tR, s
FID
sig
nal
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
+ =
Welcome to the magic world of chemometrics!
Interpolation Visualisation
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
“Folding” the chromatogram: final result Visualisation
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Conclusions Visualisation
• Visualising is simple, and gives a lot of information.
• Folding (one-dimensional) data into (2D) image introduces discontinuities in the edges. Other visualization methods (cylindrical coordinates) possible.
• Phasing can be of great help.
• Careful with “cosmetic” effects!
Second step: Pre-processingSecond step: Pre-processing
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Ste
p 2
Pre-process
Ste
p 3
Measure
Ste
p 1
View
• Base-line correction
• Noise filtering• Spike filtering• Alignment• … etc.
• Peak detection / integration
• Calibration• Deconvolution• Pattern
recognition• Class separation
• Folding• Phasing
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Pre-processingTypical problems: base-line drifts and noise
0 10 20 30Time, min
-101234
Abso
rban
ce, A
U
0 10 20 30Time, min
-0.15-0.1
-0.050
0.050.1
Abso
rban
ce, A
U
Base-line drifts Noise
20 20.2 20.4 20.6 20.8 21Time, min
-0.04-0.02
00.020.040.06
Abso
rban
ce, A
U
0 10 20 30Time, min
-0.2
-0.1
0
0.1
0.2
Abso
rban
ce, A
U
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Pre-processingBase-line drifts.
0 10 20 30Time, min
-0.15-0.1
-0.050
0.050.1
Abso
rban
ce, A
U
Original chromatogram
Corrected chromatogr.
Fitted base-line correction
Weighted least squares fitting
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Pre-processing
Originalchromatogram
Base-line drifts.
Weighted least squares fitting
Fitted base-line correction
Correctedchromatogram
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Pre-processing
Originalchromatogram
Fitted base-line correction
Correctedchromatogram
Other (more sophisticated) options:
- Use splines- Base-line correction coupled to peak detection- Fourier-transform based approaches- Wavelet-based approaches
Base-line drifts.
Weighted least squares fitting
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Pre-processing
S.E. Reichenbach, M. Ni, D. Zhang, E.B. Ledford Jr., J. Chromatogr. A, 985 (2003) 47 - 56
Base-line is reached at the (half) upper part
Base-line is reached at the (half) bottom part
Base-line drifts.
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Pre-processing
S.E. Reichenbach, M. Ni, D. Zhang, E.B. Ledford Jr., J. Chromatogr. A, 985 (2003) 47 - 56
Consider the positions with the smallest values in each half
Estimate local background parameters using neighboring
values
Interpolate the main background trend and
subtract it
Base-line drifts.
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Pre-processingNoise removal. Smoothing and derivatives.
Savitzky-Golay filter is the most common method
Two parameters should be optmisized
• Window size• Polynomial degree
These parameters govern how much correlated noise is
removed
• Large window sizes and low polynomial degree
Too much noise is removed (chromatograms appear deformed)
• Small window sizes and large polynomial degrees Too much noise remains
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Pre-processingNoise removal. Smoothing and derivatives.
Window size = 11Polynomial = 2
Window size = 41Polynomial = 2
Window size = 251Polynomial = 2
Originalchromatogram
Correctedchromatogram
G. Vivó-Truyols, P.J. Schoenmakers, "Automatic selection of optimal Savitzky-Golay smoothing”, Anal. Chem. 78 (2006) 4598-4608.
FID
sig
nal
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Pre-processingNoise removal. Spikes.
A good way of removing spikes consists of passing a median filter(before the Savitsky-Golay filter)
Savitzky-Golay filter
Median filterBase-line correction
Original data
Parameter to tune: window
size
Parameters to tune: window size and
polynomial degreeOptimizing three parameters
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Pre-processingAlignment.
Two types of alignment
Between-chromatogram alignment
Between-modulation alignment
Alignment is not always necessary, depending on the final objective of the analysis
Rarely doneUsing 2D
techniques(folded data)
Using 1D techniques
(unfolded data)
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
15 different (but related) chromatograms
13.5 13.6 13.7 13.8Time, min
Alignment
Pre-processingAlignment. COW (1D)
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Pre-processingAlignment. Using (truly) 2D algorithms
S. Castillo, I. Mattila, J. Miettinen, M. Orešič, T.Hyötyläinen, Anal. Chem. 83 ( 2011) 3058–3067
Score alignment in GCxGC-MS
D. Zhang, X. Huang, F.E. Regnier,M. Zhang, Anal. Chem., 80 (2008) 2664–2671
COW-adapted GCxGC-MS (using
single channel)
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Conclusions Pre-processing
• Pre-processing methods are almost the same: one-dimensional = two-dimensional. Normally done in the (pre-folded) raw data.
• Every case needs a particular solution (it always exists, but some care should be taken!)
Third step: measureThird step: measure
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Ste
p 2
Pre-process
Ste
p 3
Measure
Ste
p 1
View
• Base-line correction
• Noise filtering• Spike filtering• Alignment• … etc.
• Peak detection / integration
• Calibration• Deconvolution• Pattern
recognition• Class separation
• Folding• Phasing
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Peak detection in one step: the watershed algorithm Peak detection
Most common software programs use the watershed algorithm to detect peaks in 2D chromatography:
J. De Bock et al., doi 10.1007/11558484
1 2 3 4
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
??? ?Single catchmentbasin?
G. Vivó-Truyols, H.G. Janssen, J. Chromatogr. A, doi:10.1016/j.chroma.2009.12.063
Peak detection in one step: the watershed algorithm Peak detection
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
The first problem using the watershed algorithm.
Peak detection in one step: the watershed algorithm Peak detection
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
The watershed algorithm
Does a two-dimensional chromatographic peak form a single basin?
50100
150
0
5
100
500
1000
1500
y-3
y-2
y-1
y0
y3
y2y1
70 90 110 1304.5
5
5.5
1tR, AU
2 t R, A
U
4 4.5 5 5.5 60
500
1000
1500
y-2
y-1
y-3
y2
y1
y0
2tR, AU
Watershed algorithm works
G. Vivó-Truyols, H.G. Janssen, J. Chromatogr. A, doi:10.1016/j.chroma.2009.12.063
Automated peak detection in 2D chromatography
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
The watershed algorithm
Does a two-dimensional chromatographic peak form a single basin?
70 90 110 1304.5
5
5.5
1tR, AU
2 t R, A
U
50100
150
0
5
100
500
1000
1500
y-3
y-2
y-1
y0
y3
y2y1
4 4.5 5 5.5 60
500
1000
1500
2tR, AU
y-2
y-1
y3
y2
y1
y0
Saddle point
Watershed algorithm fails!
y-1
y0
G. Vivó-Truyols, H.G. Janssen, J. Chromatogr. A, doi:10.1016/j.chroma.2009.12.063
Automated peak detection in 2D chromatography
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Critical variability in second-dimension retention time:
2
2 1
pq
tP critfail
22 1
pq
tP critfail
mp
2
1
m
p
2
1
1mq 1mq
q=0.5 (eight cuts per peak)
q=1 (four cuts per peak)
q=2 (two cuts per peak)
q=4 (1 cut per peak)0 0.04 0.08 0.12 0.16 0.2
tR,crit, min
0
0.2
0.4
0.6
0.8
1
Pfa
il
p=5p=20
p=2.5
p=10
Current instruments
Current chromatographic practice
G. Vivó-Truyols, H.G. Janssen, J. Chromatogr. A, doi:10.1016/j.chroma.2009.12.063
Peak detectionin one step: the watershed algorithm Peak detection
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
S. Peters, G. Vivó-Truyols, P.J. Marriott and P.J. Schoenmakers, J. Chromatogr. A 1156 (2007) 14.E.J.C. van der Klift, G. Vivó-Truyols, F.W. Claassen, F.L. van Holthoon, T.A. van Beek, J. Chromatogr. A, 1178 (2008) 43.
Time, arbitrary units (AU)
Ste
p
1 Detect peaks as in one-dimensional chromatography
Use information from derivatives (pre-processing
step)
Peak detectioni in two steps. Peak detection
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Ste
p
2 Merge peaks that belong to the same compound according to 2nd-dimension retention time differences
50100
150
0
5
100
500
1000
1500
4 4.5 5 5.5 60
500
1000
1500
2tR, AU
T: Tolerance criterion
70 90 110 1304.5
5
5.5
1tR, AU
2 t R, A
U
Peak detectioni in two steps. Peak detection
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Ste
p
2 Merge peaks that belong to the same compound according to 2nd-dimension retention time differences
50100
150
0
5
100
500
1000
1500
70 90 110 1304.5
5
5.5
1tR, AU
2 t R, A
U
4 4.5 5 5.5 60
500
1000
1500
2tR, AU
D
D>T ?
Peak detectioni in two steps. Peak detection
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Ste
p
3 Check unimodality
50100
150
0
5
100
500
1000
1500
4 4.5 5 5.5 60
500
1000
1500
2tR, AU
70 90 110 1304.5
5
5.5
1tR, AU
2 t R, A
U
Peak detectioni in two steps. Peak detection
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Alternative algorithms?
0 5000 10000 15000
50
60
70
80
90
100
1st dimension ret. time, s
2nd
dim
ensi
on re
t. tim
e, s
0
1
2
3
4
5
6
7
8
9
10x 107
Automated peak detection in 2D chromatography
1st dimension, Ag-Column
2nd
dim
ensi
on, R
P
Data courtesy of Teris van Beek, University of Wageningen (NL)
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Automated peak detection in 2D chromatography
Each of these dots corresponds to a detected peak
Alternative algorithms?
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
7400 7500 7600 7700 7800 7900 8000 8100 820048
50
52
54-0.5
0
0.5
1
1.5
2
2.5
x 108
Sig
nali
nten
sity
, AU
Peak A
Peak B
7400 7500 7600 7700 7800 7900 8000 8100 820048
50
52
54-0.5
0
0.5
1
1.5
2
2.5
x 108
Sig
nali
nten
sity
, AU
Peak A
Peak B
Possibility 1 Possibility 2
Automated peak detection in 2D chromatographyAlternative algorithms?
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Alternative algorithms?
0 5000 10000 15000
50
60
70
80
90
100
1st dimension ret. time, s
2nd
dim
ensi
on re
t. tim
e, s
0
1
2
3
4
5
6
7
8
9
10x 107
In general, any group of 1D peaks may exhibit x possibilities of arrangement in 2D peaks that do not violate the rules of unimodality and 2tR < T (tolerance criterion)!!!
Automated peak detection in 2D chromatography
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
The Bayesian approach
1tR
2 t R
Detected 1D peaksPeak A
Peak B
Peak A
Peak B
Peak A
Peak B
Peak A
Peak B
Peak C
Sol 1 Sol 2 Sol 3
Sol 4
Peak A
Peak B
Sol n
…
Ste
p
1 Let’s consider all possible solutions of peak arrangement
Automated peak detection in 2D chromatography
1tR
2 t R
Detected 1D peaksPeak A
Peak B
Peak A
Peak B
Peak A
Peak B
Peak A
Peak B
Peak C
Sol 1 Sol 2 Sol 3
Sol 4
Peak A
Peak B
Sol n
…
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
The Bayesian approachS
tep
2 Discard those solutions that violate the unimodality criterion. Discard also those solutions that imply a too fragmented chromatographic peak.
Automated peak detection in 2D chromatography
Peak A
Peak B
Peak A
Peak B
Sol 1
Sol 2
Peak A
Peak B
Sol n
…
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
The Bayesian approachS
tep
3 Apply the Bayes theorem to calculate the probability of each solution
H1
Hypothesis
H2
…Hn
|D D| |D D|
Automated peak detection in 2D chromatography
1tR
2 t R
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
The Bayesian approach
|D ∝ D||D ∝ D|
I’m interested only in a relative value of
p(Hn|D)
|D ∝ D||D ∝ D|
All the priors have the same probability
|D D| |D D|
Automated peak detection in 2D chromatography
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
The Bayesian approach Automated peak detection in 2D chromatography
HPLC-2011
| … , 1 , , 1 1 | … , 1 , , 1 1
|
…12
12
12 1 1
212
2 21 1
|
…12
12
12 1 1
212
2 21 1
Peak phase
1st
dimension peak width
Total peak area
Prior probabilities
How much does your 1D peak profile look like a
peak?
Are the 2nd dimension retention times too far
away?The computer willdo the work for you!
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Automated peak detection in 2D chromatographyThe Bayesian approach
7400 7500 7600 7700 7800 7900 8000 8100 820048
50
52
54-0.5
0
0.5
1
1.5
2
2.5
x 108
Sig
nali
nten
sity
, AU
Peak A
Peak B
7400 7500 7600 7700 7800 7900 8000 8100 820048
50
52
54-0.5
0
0.5
1
1.5
2
2.5
x 108
Sig
nali
nten
sity
, AU
Peak A
Peak B
Possibility 1 Possibility 2
Probability = 51% Probability = 49%
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
2 t R
1tR
> 5000 peaks(or peak clusters)
Analyse each spot using
deconvolution
Deconvolution methods. Deconvolution
2D-FID chromatogram
Deconvolution
εHεyxA εHεyxA Bilinear model (for a single compound)
Cross productof two vectors
Modelled variance
xy
=
Unmodelled variance
Matrix
+
Total response
Matrix
A
Bilinear model
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
εHεyxA
n
iii
1εHεyxA
n
iii
1
Total response
Matrix
A
Unmodelled variance
Matrix
+
Modelled variance
=
+
+
+
...
x1y1
x2y2
xnyn
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Deconvolution
Bilinear model (for a n compounds)
Bilinear model
-50
5
10
15
20
25
4.04.2
4.44.6
225250
275300
325350
375
Derivatisationagent
Amino-acid
Deconvolution
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Solving the bilinear model. OPA-ALS
εYXεHεyxA
n
iii
1εYXεHεyxA
n
iii
1
y1
y2
Peak profiles inthe first order ofmeasurement(spectra)
x1 x2
Peak profiles inthe second orderof measurement(chromatograms)
AXY AXY
YAX YAX
Apply constraints
on Y
Apply constraints
on X
Initial estimates of X (or Y)
Final X and Y
DeconvolutionOPA-ALS
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
DDTiid det DDTiid det ixRD ixRD
1
x1 x2 …References(R) xi
spectrum at the ith retention time (xi)
The user selects the retention time of
maximum dissimilarity
Consider the spectrum at the selected time as the new R matrix
3.8 4 4.2 4.4 4.6time, min
d i (m
AU
)
Dissimilarity Reference(s) (R)
Mean spectrum
200 240 280 320 360 400lambda, nm
0
2
4
6
Abs
orba
nce,
mAU
DissimilarityDeconvolutionOPA-ALS
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Dissimilarity Reference(s) (R)
3.8 4 4.2 4.4 4.6time, min
d i (m
AU
)
200 240 280 320 360 400lambda, nm
0
4
8
12
16
Abs
orba
nce,
mAU2
3.8 4 4.2 4.4 4.6time, min
d i (m
AU
)
200 240 280 320 360 400lambda, nm
05
10152025
Abs
orba
nce,
mAU
3 Only noise
Initial spectra (X)
DeconvolutionOPA-ALS
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Deconvolution
Initial X
200 240 280 320 360 400lambda, nm
05
10152025
Abs
orba
nce,
mAU
AXY AXY
YAX YAX
Apply constraints
on Y
Apply constraints
on X
200 240 280 320 360 400lambda, nm
0
0.1
0.2
0.3
0.4
Abso
rban
ce, m
AU(n
orm
alis
ed)
3.8 4 4.2 4.4 4.6time, min
0
20
40
60
80
Abs
orba
nce,
mAU
Derivatisationagent
Amino-acid
Derivatisationagent
Amino-acid
Final X
Final Y
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
50100
150
0
5
100
500
1000
1500
y-3
y-2
y-1
y0
y3
y2y1
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Let’s reflect… Deconvolution
… peak profiles in the second dimension are not exactly the same (e.g. due to retention time misalignments…)
What happens with the model if…
4 4.5 5 5.5 60
500
1000
1500
2tR, AU
y-2
y-1
y3
y2
y1
y0
A Cross productof two vectors
xy
MatrixMatrix= +
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
What to do then in practice? Deconvolution
If you have multichannel detection: matrix unfolding
If you don’t have multichannel detection… you
could align… but forget about it!
Misalignments in the 2nd-dimension
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Matrix unfolding Deconvolution
1tR (region)
2 t R(r
egio
n) tR
m/z
m/z=50m/z=51
…m/z=750
m/z=50 m/z=51 … m/z=750
A Cross productof two vectors
x
y
MatrixMatrix= +
MatrixA
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Matrix unfolding. Example. Deconvolution
PLO and POL
1st dimension, Ag-Column
2nd
dim
ensi
on, R
P
Injection of corn oil in
LCxLC-MS(zoom in)
Data courtesy of Teris van Beek, University of Wageningen (NL). M. Navarro et al., presented at HPLC-Geneva.
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Matrix unfolding. Example. Deconvolution
20 22 24 26 28 30 326065707580859095
100105
570575580585590595600605610615
0
0.2
0.4
0.6
0.8
1 575
577
601
570575580585590595600605610615
0
0.2
0.4
0.6
0.8
1
575
577
601
m/z
2 t R
1tR
2 t R
1tR
Abu
ndan
ce
m/z
Abu
ndan
ce
m/z
PLO
POL
Reference library
Data courtesy of Teris van Beek, University of Wageningen (NL). M. Navarro et al., presented at HPLC-Geneva.
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Matrix unfolding & matrix augmentation Deconvolution
H. Parastar et. al, Anal. Chem. 83 (2011) 9289
εYXεHεyxA
n
iii
1εYXεHεyxA
n
iii
1
y1
y2
Peak profiles inthe first order ofmeasurement(spectra)
x1 x2
Peak profiles inthe second orderof measurement(chromatograms)
AXY AXY
YAX YAX
Apply constraints
on Y
Apply constraints
on X
Initial estimates of X (or Y)
Final X and Y
DeconvolutionFinding some unique mases. AMDIS.
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
If I can find a unique mass for each compound…
AXY AXY
DeconvolutionFinding some unique mases. AMDIS.
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
If I can find a unique mass for each compound…
… I could find the peak profiles (X) for each compound…
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Let’s reflect… Deconvolution
Two peaks are completely resolved in one dimension but retention times are the same in the other dimension…
What happens with the model if…
εHεyxA
n
iii
1εHεyxA
n
iii
1For n compounds
“Mathematical compound” is not the same as
“chemical compound”!!
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Matrix unfolding Deconvolution
1tR (region)
2 t R(r
egio
n) tR
m/z
m/z=50m/z=51
…m/z=750
m/z=50 m/z=51 … m/z=750
A Cross productof two vectors
x
y
MatrixMatrix= +
MatrixA
Could we avoid this unfolding and use true
third-order data?
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Using third-order data Deconvolution
Trilinear model (for n compounds)
??? ?What are the implications of this?
Profiles are unique in the three order of measurements
Solving the tri-linear model
Via PARAFAC, the three profiles are unique
Via PARAFAC2, one of the conditions of uniqueness can be relaxed (normally the
alignment between 1st and 2nd dimensions)
Other algorithms possible (e.g., TLD), but rarely used…
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Deconvolution methods. Deconvolution
A.E. Sinha, J.L. Hope, B.J. Prazen, C.G. Fraga, E.J. Nilsson, R.E. Synovec, J. Chromatogr. A, 1056 (2004) 145 - 154
An example of PARAFAC
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Deconvolution methods. Summary. Deconvolution
Deconvolutionmethods
Using 1D (unfolded) data
Use truly (folded) 2D data
• ALS or rank annihilation methods• Does not need between-
modulation alignment• Similar to AMDIS
• PARAFAC (needs between-modulation alignment)
• PARAFAC2 (more robust against between-modulation alignment)
Main problem: determine the number of components behind the peak cluster
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Deconvolution methods. Discussion. Deconvolution
Trilinear model (for n compounds)
??? ?What are the
implications of having high-resolution MS
instead of nominal mass?
Bilinear model (for n compounds)
Third step: measureThird step: measure
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Ste
p 2
Pre-process
Ste
p 3
Measure
Ste
p 1
View
• Base-line correction
• Noise filtering• Spike filtering• Alignment• … etc.
• Peak detection / integration
• Calibration• Deconvolution• Pattern
recognition• Class separation
• Folding• Phasing
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Pattern recognition as a variable reductionS
tep
1 Obtain chromatogram(s)
Ste
p
2 Peak detection
Ste
p
3 Pattern recognition
Ste
p
4 Unknown sample characterized
HPLC-MS/MS GCxGC-MS
GC-MS
Base-line correction Alignment
Peak detection
Digital variables
Chemicalvariables
Process/biologicalvariables
From digital variables (chromatogram) to process variables
Raw data
Features
Healthy/sick
Pattern recognition
Information
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Is peak detection necessary?
Chromatographic data
Using raw data directly
Ana
lysi
s of
flam
e ac
cele
rant
s us
ing
GC
xGC
Source ASource B
… for example:
Pattern recognition
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Pattern recognition as a variable reductionS
tep
1 Obtain chromatogram(s)
Ste
p
2 Peak detection
Ste
p
3 Pattern recognition
Ste
p
4 Unknown sample characterized
HPLC-MS/MS GCxGC-MS
GC-MS
Base-line correction Alignment
Peak detection
Digital variables
Chemicalvariables
Process/biologicalvariables
From digital variables (chromatogram) to process variables
Raw data
Features
Healthy/sick
Pattern recognition
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Pattern recognition in GCxGC. The options Pattern recognition
Pattern recognition in GCxGC
Using raw data Using peak tables
• Alignment is critical• Less chance to miss important
compounds• Normally done with the unfolded
(raw) data, but not always (e.g. N-PLS)
• Alignment not important, but peak tracking is essential (normally MS should be present)
• Chance to miss important compounds (close to the S/N)
• (Truly) 1D method
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Pattern recognition in GCxGC. Supervised methods. Pattern recognition
Any method will be prone to overfitting
In supervised pattern recognition of GCxGC, a tremendous reduction of variables is performed (form millions to a few tens/hundreds)
Any variable pre-reduction (e.g. using Fisher ratios) should be done within a cross-validation loop
Otherwise the results will be optimistic (a method that seems to work, when in fact it only works for that data)
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Pattern recognition in GCxGC. Example of a wrong strategy Pattern recognition
Ste
p 2Variable
selection: Fisher ratio on the raw
data Ste
p 3Supervised
pattern recognition: PLS-DA to
separate sick from healthyS
tep
1Obtain GCxGCchromatograms for sick (50) and
healthy (50) Ste
p 4Consider the
coefficients from PLS-DA as indicators of
potential metabolites
Objective: discovering metabolites responsible for cancer tumor
??? ?Aren’t you Overfitting? No, I’ve been cross-
validating the PLS-DA
… but the variable pre-selection has been done with the full data set!!
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Pattern recognition in GCxGC. Example of a wrong strategy Pattern recognition
Ste
p 2Variable
selection: Fisher ratio on the raw
data Ste
p 3Supervised
pattern recognition: PLS-DA to
separate sick from healthyS
tep
1Obtain GCxGCchromatograms for sick (50) and
healthy (50) Ste
p 4Consider the
coefficients from PLS-DA as indicators of
potential metabolites
Objective: discovering metabolites responsible for cancer tumor
??? ?Aren’t you Overfitting? No, I’ve been cross-
validating the variable selection and the PLS-DA
correct
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Orig
inal
dat
a se
t
Ste
p 2Withdraw
subset 1, fit the model with the rest, and test
the model with subset 1 S
tep
3
Do the same for the
other k sectionsS
tep
1Divide the original data
set in k subsets
(randomly) Ste
p 4
Sum up the error of the model in all
validation sets
1
2
…
k
Cal
ibra
tion
Val
Cal
ibra
tion
Val
Cal
Test the model here
Test the model here
Cal
Val
Cal
Pattern recognition in GCxGC. Example of a correct strategy Pattern recognition
“model” = “variable pre-selection (Fisher ratio) + PLS-DA”
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Conclusions Multivariate methods
• Deconvolution: normally done with the unfolded data (less problems with between-modulation alignment)
• Deconvolution: problem to establish the number of compounds (normally done in a manual way)
• Two ways for pattern recognition: with raw data (normally preferred) or with peak table.
• Careful with validation of supervised pattern recognition. Variable pre-selection should be included in the validation loop.
Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam
Further debate… Multivariate methods
This presentation has been uploaded in my blog: www.tecnometrix.com
Feel free to download and generate debate if you wish!