embo practical course on metabolomics bioinformatics for life scientists oscar yanes, phd...
TRANSCRIPT
EMBO Practical Course on Metabolomics Bioinformatics for Life Scientists
Oscar Yanes, PhD
“Dissecting an untargeted metabolomic workflow”
Untargeted metabolomics workflowUntargeted metabolomics workflow
HypothesisExperimental validation
Samplepreparation
Sample analysisby MS and NMR
Pre-processingdata analysis
Metaboliteidentification
Experimentaldesign
Untargeted metabolomics workflowUntargeted metabolomics workflow
HypothesisExperimental validation
Samplepreparation
Sample analysisby MS and NMR
Pre-processingdata analysis
Metaboliteidentification
EMBO Course
Experimentaldesign
Hypothesis
Biomarker discovery
List of metabolites differentiallyregulated
Pathway analysis Model construction Scientific literatureDisease vs. control
Validation
Mechanism
Ultimate goal of metabolomics
Untargeted metabolomics workflowUntargeted metabolomics workflow
HypothesisExperimental validation
Samplepreparation
Sample analysisby MS and NMR
Pre-processingdata analysis
Metaboliteidentification
Experimentaldesign
THE IMPORTANCE OF EXPERIMENTAL DESIGN
COLLABORATOR
I want to do metabolomicsI want to do
metabolomics
ME
THE IMPORTANCE OF EXPERIMENTAL DESIGN
COLLABORATOR
I want to do metabolomicsI want to do metabolomics
ME
……
I have many samples at -80°C.
Could you do metabolomics and
find out something?
I have many samples at -80°C.
Could you do metabolomics and
find out something?
THE IMPORTANCE OF EXPERIMENTAL DESIGN
COLLABORATORME
I have many samples at -80°C.
Could you do metabolomics and
find out something?
I have many samples at -80°C.
Could you do metabolomics and
find out something?
!!!!
THE IMPORTANCE OF EXPERIMENTAL DESIGN
COLLABORATORME
THE IMPORTANCE OF EXPERIMENTAL DESIGN
BASIC DIAGRAM OF A MASS SPECTROMETER
BASIC DIAGRAM OF A MASS SPECTROMETER
Gas-phase:Gas chromatography
Liquid-phase:Liquid chromatographyCapillary electrophoresis
Solid-phase:Surface-based
BASIC DIAGRAM OF A MASS SPECTROMETER
Electron ionization (EI)Chemical ionization (CI)Atmospheric pressure chemical ionization (APCI)Electrospray ionization (ESI)Laser desorption ionization (LDI)
Glucose
0.0
0.1
0.2
0.3
0.4
0 4 12 24Time (h)
Are
a/A
rea
(IS
)
Lactate
0.0
0.2
0.4
0.6
0.8
1.0
0 4 12 24Time (h)
Are
a/A
rea
(IS
)
Pyruvic Acid
0.0
0.1
0.2
0 4 12 24Time (h)
Are
a/A
rea
(IS
)
Choline
0.0
0.2
0.4
0.6
0.8
1.0
0 4 12 24Time (h)
Are
a/A
rea
(IS
)
Watch out serum/plasma samples from biobanks!Watch out serum/plasma samples from biobanks!
Untargeted metabolomics workflowUntargeted metabolomics workflow
HypothesisExperimental validation
Samplepreparation
Sample analysisby MS
Pre-processingdata analysis
Metaboliteidentification
Experimentaldesign
Maximize ionization efficiency over the whole mass range (e.g., m/z 80-1500)
Requisite for untargeted metabolomics
Maximize ionization efficiency over the whole mass range (e.g., m/z 80-1500)
Number of features Intensity of the features
Requisite for untargeted metabolomics
Maximize ionization efficiency over the whole mass range (e.g., m/z 80-1500)
Number of features Intensity of the features
Coverage of the metabolome Accurate quantification and identification of metabolites
Requisite for untargeted metabolomics
How do we increase the number of features and their intensity??
time
massintensity
Feature: molecular entity with a unique m/z and retention time value
How do we increase the number of features and their intensity??
time
massintensity
Sample preparation: - Extraction method
Chromatography: - Stationary-phase- Mobile-phase
Ion Funnel Technologyetc.
Hot EtOH/Amm. Acetate Cold Acetone/MeOH
Only 45% of the metabolites are detected with Acetone/MeOH
MS/MS threshold
Extraction method
Yanes O., et al. Anal. Chem. 2011; 83(6):2152-61
Extraction method
Liquid Chromatography: mobile-phase
Yanes O et al. Anal. Chem. 2011; 83(6):2152-61
Ammonium Fluoride Ammonium acetate Formic acid
Ammonium fluoride
Ammonium acetate
F-
Ammonium fluoride
Chromatography: stationary phase
HILIC RP C18/C8
LC flow rate and pressure: UPLC vs. HPLC vs. nanoLC (vs. GC!)
HPLC
UPLC
MinutesMinutes
Effect of pH; ammonium salts; ion pairs (e.g. TBA)
BASIC DIAGRAM OF A MASS SPECTROMETER
Electron ionization (EI)Chemical ionization (CI)Atmospheric pressure chemical ionization (APCI)Electrospray ionization (ESI)Laser desorption ionization (LDI)
PRACTICAL ASPECTSPRACTICAL ASPECTS
1. Number of scans/secondImplications in LC/MS and GC/MS:
QuantificationMaximum intensity or integrated area
2. Instrument resolutionImplications:
Detector saturationQuantification
3. Sample amount injectedImplications:
Detector saturation
Untargeted metabolomics workflowUntargeted metabolomics workflow
HypothesisExperimental validation
Samplepreparation
Sample analysisby MS and NMR
Pre-processingdata analysis
Metaboliteidentification
EMBO Course
Experimentaldesign
RAW METABOLOMICS DATA
PRE-PROCESSINGPRE-PROCESSING
STATISTICAL ANALYSISSTATISTICAL ANALYSIS
RAW DATA CONVERSIONRAW DATA CONVERSION
METABOLITE IDENTIFICATIONS
METABOLITE IDENTIFICATIONS
FROM RAW DATA TO METABOLITE IDs
PRE-PROCESSING
PRE-PROCESSING
STATISTICAL ANALYSIS
STATISTICAL ANALYSIS
RAW DATA CONVERSION
RAW DATA CONVERSION
METABOLITE IDENTIFICATIONS
METABOLITE IDENTIFICATIONS
PATHWAY ANALYSISPATHWAY ANALYSIS
FROM RAW DATA TO METABOLITES IDs
LC/MSGC/MS
GC/MSLC/MS
LC-MS RAW DATA
LC-MS RAW DATA
LC-MS WORKFLOW
PREPROCESSING
STATISTICAL ANALYSIS
mZDATA
IDENTIFICATION
............
...I...M2
......IM1
mZRT3mZRT2mZRT1
mZRT2M2
mZRT1M1
mZRT Features Table
Feature: individual ions with a unique mass-to-charge ratio and a unique retention time
PROTEOWIZARD
LC-MS WORKFLOW
RAW LC-MS DATA TO mZXML: PROTEOWIZARD
[Nature Biotechnology, 30 (918–920) (2012)]
VENDOR FORMATS CONVERTERAgilent MassHunter.d ProteoWizardBruker Compass.d, YEP, BAF, FID ProteoWizardThermo Fisher RAW ProteoWizardWaters MassLynx.raw ProteoWizardAB Sciex WIFF ProteoWizard
LC-MS WORK-FLOW
XCMS PRE-PROCESSING
•http://metlin.scripps.edu/download/•Free & Open Source•Based on R•On-line version
•Suitable for:-GC-MS-LC-MS
Analytical Chemistry, 78(3), 779–787, 2006Analytical Chemistry, 84(11), 5035-5039, 2012
LC-MS WORKFLOW
[BMC Bioinformatics, 2008 9:504]
XCMS PRE-PROCESSING
1. FEATURE DETECTION
LC-MS WORKFLOW
XCMS PRE-PROCESSING
1. FEATURE DETECTION
1. Dense regions in m/z space2. Gaussian peak shape in chromatogram
LC-MS WORK-FLOW
XCMS PRE-PROCESSING
2. RETENTION TIME CORRECTION
LC-MS WORKFLOW
FEATURES RANKINGThose features varying according to our phenomena are retained to further identification experiments
STATISTICAL ANALYSIS
• 103-104 mZRT features IDENTIFICATION NOT FEASIBLE!• features redundancy:
-adducts: [M+H+], [M+Na+], [M+NH4+], [M+H+-H2O]…
-isotopes: [M+1], [M+2], [M+3]• Many mZRT features are noisy in nature and irrelevant to our phenomea
LC-MS WORK-FLOWFEATURES RANKING CRITERIA
WORKLIST
-RANDOMIZE-USE QCs TO CHECK ANALYTICAL VARIATION
(I) ANALYTICAL VARIABILITY
LC-MS WORK-FLOW
FEATURES RANKING CRITERIA
(I) ANALYTICAL VARIABILITY
100)(
)(
)(
QCmZRT
QCmZRTQC
mZRT
j
j
j X
SCV
100)(
)(
)(
TmZRT
TmZRTT
mZRT
j
j
j X
SCV
USEFUL PLOTS IN EXPLORATORY DATA ANALYSIS
NEURONAL CELL CULTURESKO (N=15) vs WT (N=11)#mZRT=6831
RETINASHypoxia (N=12) vs Normoxia (N=13)#mZRT=7654
LC-MS WORK-FLOWFEATURES RANKING CRITERIA
(IV) HYPOTHESIS TESTING+FDR
=0.05 (235 features significantly varied by chance, 26% out of 900)
FDR=0.0074 (20 features varied by chance, 5% out of 404)
#features=4704
USEFUL PLOTS IN EXPLORATORY DATA ANALYSIS
NEURONAL CELL CULTURESKO (N=15) vs WT (N=11)#mZRT=6831
RETINASHypoxia (N=12) vs Normoxia (N=13)#mZRT=7654
USEFUL PLOTS IN EXPLORATORY DATA ANALYSIS
NEURONAL CELL CULTURESKO (N=15) vs WT (N=11)#mZRT=6831
RETINASHypoxia (N=12) vs Normoxia (N=13)#mZRT=7654
LC-MS WORKFLOW
(i) analytical variability
(ii) features intensity
# mZRT=51908
# mZRT=38377
# mZRT=4704
# mZRT=250
(iii) hypothesis testing + fold change
10M data points
Annotation
Data Base look-up
Identification experiments
10-50 differential metabolites
Workflow for Metabolite Identification
Step 1: Select interesting featuresStep 1: Select interesting features
Step 2: Search databases for accurate massStep 2: Search databases for accurate mass
Step 3: Filter “putative” identification listStep 3: Filter “putative” identification list
Step 4: Compare RT and MS/MS of standardsStep 4: Compare RT and MS/MS of standards
Step 1: Select interesting featuresStep 1: Select interesting features
Step 2: Search databases for accurate mass
Step 3: Filter “putative” identification list
Step 4: Compare RT and MS/MS of standards
Workflow for Metabolite Identification
Step 1: Select interesting features
Step 2: Search databases for accurate massStep 2: Search databases for accurate mass
Step 3: Filter “putative” identification list
Step 4: Compare RT and MS/MS of standards
Workflow for Metabolite Identification
Step 2: Search databases for accurate massStep 2: Search databases for accurate mass
HMDB MetlinEach feature returns many hits.
Step 2: Search databases for accurate massStep 2: Search databases for accurate mass
Common adducts Na+, NH4+, K+, Cl-, and H2O loss
Adducts increase number of hits returned!
Step 2: Search databases for accurate massStep 2: Search databases for accurate mass
Step 1: Select interesting features
Step 2: Search databases for accurate mass
Step 3: Filter “putative” identification listStep 3: Filter “putative” identification list
Step 4: Compare RT and MS/MS of standards
Workflow for Metabolite Identification
Eliminate•drugs?• intensity in the mass spectrum• adducts?• matches with obviously inconsistent retention times
Example: feature with m/z 733.56 is unlikely to be a phospholipid if it has a 1-min RT with reverse-phase chromatography.
Look for hits that implicate the same pathway, give those features priority.Look for hits that implicate the same pathway, give those features priority.
Standards can be expensive, your intuition will save you money and time!
Step 3: Filter “putative” identification listStep 3: Filter “putative” identification list
Step 1: Select interesting features
Step 2: Search databases for accurate mass
Step 3: Filter “putative” identification list
Step 4: Compare RT and MS/MS of standardsStep 4: Compare RT and MS/MS of standards
Workflow for Metabolite Identification
What experimental data should be required to constitute a metabolite identification?
• Accurate mass?
• Retention time?
• MS/MS data?
Unlike proteomics, no journals have requirements or guidelines for publication of metabolite identifications.
“The identification of certain metabolites as their exact masses in their given biological context was strategic in the context of searching for biomarkers for CD.”
accurate mass
“…this method enables untargeted profiling of metabolites using accurate mass-retention time (AMRT) identifiers.”
accurate mass and retention time
“Metabolites were putatively identified on the basis of accurate mass and retention time, and confirmed by comparing MS/MS data of unknowns to model compounds.”
accurate mass, retention time, and MS/MS
“The identification of certain metabolites as their exact masses in their given biological context was strategic in the context of searching for biomarkers for CD.”
accurate mass
Accurate mass identifications are putativeAll structures have a neutral mass of 146.0691
Mass error (even if small) and adducts add more possibilities!
“The identification of certain metabolites as their exact masses in their given biological context was strategic in the context of searching for biomarkers for CD.”
accurate mass
“…this method enables untargeted profiling of metabolites using accurate mass-retention time (AMRT) identfiers.”
accurate mass and retention time
“Metabolites were putatively identified on the basis of accurate mass and retention time, and confirmed by comparing MS/MS data of unknowns to model compounds.”
accurate mass, retention time, and MS/MS
“…this method enables untargeted profiling of metabolites using accurate mass-retention time (AMRT) identfiers.”
accurate mass and retention time
Many structural isomers have the retention time
Citrate and isocitrate have the same retention time but different MS/MS patterns.
isocitrate
citrate
“The identification of certain metabolites as their exact masses in their given biological context was strategic in the context of searching for biomarkers for CD.”
accurate mass
“…this method enables untargeted profiling of metabolites using accurate mass-retention time (AMRT) identfiers.”
accurate mass and retention time
“Metabolites were putatively identified on the basis of accurate mass and retention time, and confirmed by comparing MS/MS data of unknowns to model compounds.”
accurate mass, retention time, and MS/MS
“Metabolites were putatively identified on the basis of accurate mass and retention time, and confirmed by comparing MS/MS data of unknowns to model compounds.”
accurate mass, retention time, and MS/MS
Mass-to-Charge (m/z)60 100 140 180 220 260 300 340 380 420
367.33
367.33
H
H
H
HO
H
H
OH
Standard7α-hydroxy-cholesterol
Biological sample
Q-TOF
Step 4: Compare RT and MS/MS of standardsStep 4: Compare RT and MS/MS of standards
Retention time will be available from the profiling experiment, however, to obtain MS/MS data for the feature of interest in the research sample typically another experiment is required.
Note: Only need to perform MS/MS on one research sample. Pick a sample from the group for which the feature is up-regulated!
Do not pick this group
Step 4: Compare RT and MS/MS of standardsStep 4: Compare RT and MS/MS of standards
What if feature of interest is not in the database?(or model compound is not commercially available)
FT-ICR MS can be used to limit chemical formulas
MS/MS can be insightful to reveal structural insight(MS/MS library, bioinformatic approaches)
NMR can provide structural details
When a chemist is your best friend…
What if feature of interest is not in the database?(or model compound is not commercially available)
FT-ICR MS can be used to limit chemical formulas
MS/MS can be insightful to reveal structural insight(MS/MS library, bioinformatic approaches)
NMR can provide structural details
When a chemist is your best friend…
What if feature of interest is not in the database?(or model compound is not commercially available)
FT-ICR MS can be used to limit chemical formulas
MS/MS can be insightful to reveal structural insight(MS/MS library, bioinformatic approaches)
NMR can provide structural details
When a chemist is your best friend…
What if feature of interest is not in the database?(or model compound is not commercially available)
FT-ICR MS can be used to limit chemical formulas
MS/MS can be insightful to reveal structural insight(MS/MS library, bioinformatic approaches)
NMR can provide structural details
When a chemist is your best friend…
• Thermophile organism adapted to live at high temperatures.
• Organisms challenged with cold temperature (72 º C) and compared to high-temperature (95 º C) controls.
Feature up-regulated at cold temperature
Identification???
Natural productNatural product
N1-AcetylthermospermineN1-Acetylthermospermine
*
*
Feature up-regulated at cold temperature
Natural productNatural product
N1-AcetylthermospermineN1-Acetylthermospermine
*
*
Intensity of m/z 112 fragment is significantly different. NOT A MATCH!
Chemical synthesis of hypothesized structure is required
Synthesized metabolite produces comparable MS/MS data as natural product from Pyrococcusfuriosus.
Natural productNatural product
N1-AcetylthermospermineN1-Acetylthermospermine
N4(N-Acetylaminopropyl)spermidineN4(N-Acetylaminopropyl)spermidine
Hypothesis
Biomarker discovery
List of metabolites differentiallyregulated
Pathway analysis Model construction Scientific literatureDisease vs. control
Validation
Mechanism
Ultimate goal of metabolomics
LC and GC-Triple quadrupole MS
Validate your metabolites!!
Targeted metabolomics Molecular biology techniques
ImmunohistochemistryReverse Transcription-PCRGene expression arrayCell culturesAnimal experimentation …..