novel ms discovery-to-targeted srm workflows incorporating roc curve analysis of putative biomarker...
TRANSCRIPT
Novel MS discovery-to-targeted SRM workflows incorporating ROC curve analysis of putative biomarker candidates in bona fide clinical samples
Mary F Lopez
Director, BRIMSSwedish Proteomics Society,
Gothenberg, Sweden 11-21-10
Biomarker Discovery-to-Targeted Workflow for Proteomics
Fishing for differentially expressed proteinsDiscovery of putative biomarkers
Targeting proteins in known pathways Verification of putative biomarkers
The SIEVE workflow can be described in 3 main steps:
Frame
•Global intensity-based features
•Reconstructed chromatograms
•Significance statistics and annotation filters
Align
•Chromatographic alignment
•Scalable Adaptive Tiled Algorithm
Identify
•SEQUEST or Mascot for protein/peptides
•ChemSpider for small molecules
Design and Optimization• Robust, commercially
available nanoflow LC• Commercially
available columns• Focus on stable spray• Focus on high
reproducibility of peak intensities, CV<8%
Pass 1: Quantification• Chromatographic alignment• Uncompromised full scan
measurements • Each sample is measured
once – no need for replicates
• Internal peptide standards (normalization)
• Triplicate runs of peptide standards every 12 runs (instrument QC)
• “Top10” data dependent acquisition
• Stringent Precursor ion selection criteria
Pass 2: Identification• Targeted fragmentation by Inclusion
list• Relaxed Precursor ion selection
criteria• Not all samples measured – subset
as determined from SIEVE analysis• Internal peptide standards • Marker stratification using multi
marker and single ROC AUC (SIEVE 1.3)
• Export to Ingenuity pathway analysis
In order to realize the quantitative power of SIEVE, data collection must be very robust
methods Inclusion list
BRIMS Two-Pass Discovery Workflow using SIEVE and Orbitrap Velos
LC setup for Two-Pass Workflow
• Thermo Proxeon EasyNanoLC eliminates need for time consuming SPE and sample pump downs. Just acidify, add standards and load digested peptides.
• Controlled trapping flow rates ensure consistent sample retention and salt removal.• Rapid column equilibration allows for enhanced duty cycle. • Hydrophobicity differences from trapping column to resolving column allows for effective refocusing.• Larger resolving column allows for higher capacity, and rapid application of gradient to the
column(flow rates to 1.0uL/min)
Waste tubing
HV in(from source)
5cm trap column
25cm resolving column
From pump/autosampler
To Orbitrap Velos
Data Quality – Spray Stability
March 30 April 4
Spray stability is the largest factor in reproducible measurements.
Method for Assessing Systematic Errors without Sample Technical Replicates
• Systematic errors are assessed from triplicate acquisition of standard sample.
• Internal standards are spiked in all samples.
Blank
Standards Calibration
Standards Calibration
Standards Calibration
Top 10 Fragmenta-
tion
Sample Full Scan
Sample Full Scan
Sample Full Scan
Sample Full Scan
Sample Full Scan
Sample Full Scan
Blank run
Standards calibration
Column regeneration – top 10
Patient samples – full scan onlyPass 1 Acquisition cycle
The Two- Pass workflow increases sensitivity by effectively fractionating samples in silico
• Typical MS acquisition parameters are not geared for quantification.
• Data dependent acquisition triggers MS2 based on intensity so most low abundance biomarkers are not identified in complex mixtures with large dynamic range ie blood.
• Classical “shotgun” approaches focus on physical sample fractionation strategies such as depletion and cation exchange coupled with data dependent acquisition.
• Physical fractionation such as depletion and cation exchange results in loss of albumin binding proteins and multiple runs for each sample.
• These approaches are very labor intensive, time consuming and typically do not allow for rigorous quantification and statistical power because fewer samples are analyzed due to time and instrument constraints.
• The Two-Pass Workflow using Inclusion Lists optimizes parameters for full scan quantification and MS2 triggering separately.
• This results in:
Higher sensitivity and getting deeper into the proteome, ie more ID’s
Precise and reproducible quantification
Flexibility in creating the inclusion list based upon desired attributes such as differential expression, PTM’s or other parameters.
Reducing the number of replicates needed since LC reproducibility and %CV’s are so low (ca 8%)
Increases the biological sampling power (can run more samples in a shorter time).
Decreases the circular biomarker identification syndrome, ie we identified Albumin AGAIN.
Quantitative Statistics for the Two-Pass Workflow
2076
498 461540un
ique
pep
tides
Data dependent “Top 10”
Inclusion list 1
Inclusion list 2
Inclusion list 3
Data Dependent “Top 10” vs Inclusion list
Dataset from a recent collaboration on stroke (discussed in later slides)
Ion Score vs Concentration of Spiked Standard Peptide in Plasma
1 10 100 1000 100000
50
100
150
200
250
300
Top 10
Two-Pass
Concentration (amol)
Ma
sco
t Io
n S
core
Ongoing collaboration with Dr. MingMing Ning, Mass General Hospital and Harvard University
Discovery of Blood Biomarkers in PFO related Acute Stroke
Application of Discovery Two-Pass Workflow using SIEVE and Orbitrap Velos
Atrial septum
•The prevalence of PFOs in the general population is around 25%, but it is doubled in cryptogenic (unknown cause) stroke patients. These patients are often young and “healthy”.
•If there is a clot traveling into the right side of the heart, it can cross the PFO, enter the left atrium, and travel out of the heart and to the brain causing a stroke.
•This suggests a causal relationship between PFO and cryptogenic stroke.
•Supported by NIH/NINDS (Dr Tom Jacobs), MGH Cardio-Neurology Division evaluates patients with PFO related stroke and the therapeutic efficacy of surgical PFO closure and other stroke treatment.
•Venous blood samples from stroke patients are taken before (upon admission) and at12 month follow up after PFO closure.
•Biomarkers for PFO-related stroke could be clinically useful.
Number ofpatients
Sample type Patient
5 PFO pre OP Stroke
8 Patient matched PFO post OP Stroke
Collaboration with Dr. M. Ning, Harvard, MGH, on PFO Stroke
When the atrial septum does not close properly, it is called a patent foramen ovale or PFO.
SIEVE experiment for the PFO stroke study
Sample groups were identified in SIEVE at the beginning of the analysis
Number ofpatients
Sample type Patient
5 PFO pre OP Stroke
8 Patient matched PFO post OP Stroke
SIEVE data demonstrated high reproducibility and robustness of measurements
Reconstructed ion chromatogram of an example frame (not differentially expressed)
Whisker plot of expression ratios for all 13 peptides identified for protein gi119372317Gray area represents 90% confidence interval for expected protein ratio
3575 unique peptides and 263 proteins were identified in the study with high confidence128 were differentially expressed (determined by ratio)
ROC* analysis: How can we quickly rank the potential“usefulness” of putative biomarkers for clinical research?
Why? Expression ratio and Pvalue may not necessarily be specific to the pathology.
How can we query the data and test the classification power of the target analytes?• Create ROC curves by plotting false positives vs true positives while adjusting the criteria threshold. The area under
the curve, AUC is a measurement of classification power.
• Use AUC to select optimal candidates and discard suboptimal candidates.
• AUC values range from 0.5 to 1.0. An AUC of 1.0 indicates a specificity and sensitivity of 100%.
• Generate AUC values for individual markers and marker ratios.
*Receiver Operating Characteristic (a classification model)
Specificity
Sen
sitiv
ity
Description Peptides
Ratio*Pre OP
VSPost OP
%standard error
StdDevPre OP
VSPost OP
PvaluePre OP
VSPost OP Avg ROC
AUC
_gi_4503635_ref_NP_000497.1_ prothrombin preproprotein [Homo sapiens] 4 0.55 16.57 0.09 9.9E-20 1.00
_gi_261878616_ref_NP_001159907.1_ inter_alpha_trypsin inhibitor heavy chain H1 isoform c [Homo sapiens] 5 0.48 19.94 0.10 9.9E-20 1.00
_gi_283806712_ref_NP_001164609.1_ clusterin isoform 3 [Homo sapiens] 6 0.53 16.32 0.09 9.9E-20 0.99_gi_70778918_ref_NP_002207.2_ inter_alpha_trypsin inhibitor heavy chain H2 [Homo sapiens] 16 0.51 9.27 0.05 9.9E-20 0.99
_gi_32483410_ref_NP_000574.2_ vitamin D_binding protein precursor [Homo sapiens] 7 0.45 19.76 0.09 9.9E-20 0.99
_gi_41393602_ref_NP_958850.1_ complement C1s subcomponent precursor [Homo sapiens] 3 0.56 18.32 0.10 9.9E-20 0.99
_gi_4502261_ref_NP_000479.1_ antithrombin_III precursor [Homo sapiens] 12 0.31 13.57 0.04 9.9E-20 0.98
_gi_31542984_ref_NP_002209.2_ inter_alpha_trypsin inhibitor heavy chain H4 isoform 1 precursor [Homo sapiens] 19 0.45 13.00 0.06 9.9E-20 0.97
_gi_50659080_ref_NP_001076.2_ alpha_1_antichymotrypsin precursor [Homo sapiens] 11 0.49 12.94 0.06 9.9E-20 0.96
_gi_239752152_ref_XP_002348153.1_ PREDICTED: hypothetical protein XP_002348153 [Homo sapiens] 3 0.56 16.58 0.09 9.9E-20 0.96
_gi_73858570_ref_NP_001027466.1_ plasma protease C1 inhibitor precursor [Homo sapiens] 9 0.57 11.69 0.07 9.9E-20 0.96
_gi_38016947_ref_NP_001726.2_ complement C5 preproprotein [Homo sapiens] 8 0.60 16.35 0.10 9.9E-20 0.96
_gi_4557321_ref_NP_000030.1_ apolipoprotein A_I preproprotein [Homo sapiens] 13 0.54 10.15 0.05 9.9E-20 0.96
_gi_62739186_ref_NP_000177.2_ complement factor H isoform a precursor [Homo sapiens] 4 0.60 19.42 0.12 9.9E-20 0.95
_gi_4557871_ref_NP_001054.1_ serotransferrin precursor [Homo sapiens] 16 0.21 13.61 0.03 9.9E-20 0.95
_gi_4557485_ref_NP_000087.1_ ceruloplasmin precursor [Homo sapiens] 22 0.37 13.15 0.05 9.9E-20 0.95
_gi_296080754_ref_NP_001171670.1_ fibrinogen beta chain isoform 2 preproprotein [Homo sapiens] 18 0.21 14.15 0.03 9.9E-20 0.95
_gi_70906437_ref_NP_000500.2_ fibrinogen gamma chain isoform gamma_A precursor [Homo sapiens] 16 0.54 9.85 0.05 9.9E-20 0.94
_gi_169214179_ref_XP_001724196.1_ PREDICTED: similar to complement component 3 [Homo sapiens] 12 0.49 14.89 0.07 9.9E-20 0.94
_gi_4557325_ref_NP_000032.1_ apolipoprotein E precursor [Homo sapiens] 9 0.45 18.63 0.08 9.9E-20 0.94
Top 21 single proteins with highest ROC AUC for PFO Stroke Study
* Ratio = PRE OP/POST OP
Biological context? Ingenuity Pathways Analysis (IPA)
Top network Lipid Metabolism
Top physiological system development and function
Neurological Disease
Top disease Hematological system
Top Canonical pathways
Acute phase signalingCoagulation systemComplement systemIntrinsic Prothrombin PathwayExtrinsic Prothrombin Pathway
The entire PFO stroke dataset was uploaded and analyzed with IPA
Top 2 ROC AUC candidates, selected literature references
Clin Chim Acta. 2009 Apr;402(1-2):160-3.Inter-alpha-trypsin inhibitor heavy chain 4 is a novel marker of acute ischemic stroke.Kashyap RS, Nayak AR, Deshpande PS, Kabra D, Purohit HJ, Taori GM, Daginawala HF.Biochemistry Research Laboratory, Central India Institute of Medical Sciences, 88/2 Bajaj Nagar Nagpur-10, India.
Stroke. 2007 Jul;38(7):2070-3. Epub 2007 May 24.Prothrombotic mutations as risk factors for cryptogenic ischemic cerebrovascular events in young subjects with patent foramen ovale.Botto N, Spadoni I, Giusti S, Ait-Ali L, Sicari R, Andreassi MG.CNR Institute of Clinical Physiology, G. Pasquinucci Hospital, Massa, Italy.
Description Avg ROC AUC_gi_4503635_ref_NP_000497.1_ prothrombin preproprotein [Homo sapiens] 1.00
_gi_261878616_ref_NP_001159907.1_ inter_alpha_trypsin inhibitor heavy chain H1 isoform c [Homo sapiens] 1.00
Verification and translation of putative biomarkers into targeted assays using SRM and PinpointTM Software
Pinpoint software was developed (at BRIMS) to make SRM assays easy, automated and efficient
List of Targeted Proteins
Discovery data:Protein DiscovererSIEVEPeptide AtlasNISTGPMRecombinant ProteinHeavy-Labeled PeptidesQC Standards
Exhaustive List: - Peptides - Transitions
Identify and Verify: - Best Peptides - Best Transitions Refine Transition ListOptimize LC Gradient
Verify the LC-SRM Assay with Recombinant Digests
Analyze Biological Samples
Pinpoint
Pinpoint Algorithmic prediction
Pinpoint provides assay throughput options…
5-10 peptides
50-100 peptides
500-1000 peptides
5000-10000 peptides
Regular multiple SRM
Scheduled SRM
(tSRM)
tSRM
+
iSRM
tSRM
+
iSRM
+
Split-n-stitchAutomated scoring schemes to help prioritize large analysis into high, medium, low quality bins
And more…• Single software to help iterative method building to go from protein list to absolute abundance• Multi-threaded• Extremely easy data and results sharing• Customers can give video feedback• Video help tutorials to get you started
iSRM – Quantifying and verifying low level biomarkers in biological matrices
y3
y4
y5
y6
y7 y8
y9 y10
E L A S G L F P V G F K
Primary SRM Transitionm/z 680.37 → 789.44NL: 2.48E2
Primary SRM Transitionm/z 680.37 → 959.54NL: 1.50E2
Data Dependent SRMPrimary and Secondary SRMTransitionsNL: 1.12E3
Ongoing collaboration with Dr. MingMing Ning, Mass General Hospital and Harvard University
Development of a multiplexed SRM assay for Apolipoproteins:
Application Cardiovascular disease and stroke
Targeted assay development for high abundance proteins
Ischemic vs hemorrhagic stroke
• About 80 percent of strokes are ischemic, caused by a blockage of the vessels that supply blood to the brain. More than 400,000 people in the United States every year are affected.
• About 20 percent of all strokes are hemorrhagic; this type of stroke involves the rupture of a blood vessel in or around the brain.
• TPA is the only treatment for ischemic stroke. It can only be given within 6 hrs of the event.
• If TPA is given to a hemorrhagic stroke patient, death can result.
• An assay that could accurately differentiate ischemic from hemorrhagic stroke quickly would be clinically useful.
Diagnosis for acute stroke is currently by:• Neurological exam• CAT scan• MRI• Lumbar pucture
Number ofpatients
Blood Collection times Sample type
53 Upon admission Ischemic Stroke
26 Upon admission Hemorrhagic stroke
Development of a multiplexed assay for a panel of apolipoproteins: application to stroke
The relative levels of various apolipoproteins can be important biomarkers for heart disease, stroke, Alzheimer’s, diabetes and metabolic syndrome.
Typically, these proteins are individually measured in blood by immunoassay.
The availability of a multiplexed assay that could simultaneously and quantitatively measure a panel of apolipoproteins would be an extremely useful clinical research tool.
We decided to interrogate clinical samples to see if apolipoproteins could be used to classify different types of strokes.
Clinical Samples
Single day development of a multiplexed assay for a panel of apolipoproteins using Pinpoint
Import protein sequences and priorLC-MS/MS discovery datalibrary for 10 Apolipoproteins
1Choose optimal “proteotypic” peptides: ie, Highest intensity and unique.Narrow list down to one peptide per protein
2
3Choose at least 5 fragment transitions per peptide. This ensures accurate identification of peptides.Create method and run sample triplicates.
ROC analysis of apolipoprotein levels in hemorrhagic vs ischemic stroke patients: Single marker AUC
Top AUC for single marker
Apo CIII 0.80
Apo AI 0.76
Apo CII 0.70
Apo D 0.66
1. Apolipoprotein Panel Apo AIApo AIIApo AIVApo BApo CIApo CIIApo CIIIApo DApo EApo H
ROC analysis of apolipoprotein levels in hemorrhagic vs ischemic stroke patients: Multi marker AUC
Top AUC for multi markers
Apo CIII and Apo AII 0.80
Apo CIII and Apo CI 0.87
Apo H and Apo AII 0.86
Apo AI and Apo CI 0.85
1. Apolipoprotein Panel Apo AIApo AIIApo AIVApo BApo CIApo CIIApo CIIIApo DApo EApo H
Development of an assay for PTH: Collaboration with Intrinsic
BioProbes and Mayo Clinic
Clinical Chemistry, 2010
Targeted assay development for low abundance proteins
Dr. Ravinder SinghDr. Randall Nelson
The large dynamic range of proteins in blood presents a technical hurdle to the development of SRM assays biomarkers present in low abundance
PTH is secreted into the circulatory system to produce healthy concentrations of ca 15 – 65 pg/mL, therefore enrichment is required for mass spec detection
PTH range
Intrinsic BioProbes/ThermoFisher PTH assay platform for enrichment of low abundance proteins using MSIA (Mass Spec Immuno Assay)
Clinical samples
Capture on AB activated tips
MSIA TIP Versette (ALH) TSQ Vantage (MS)
Affinity Capture Automated Processing Quantitative Analysis
• Conventional PTH assays typically rely on two-antibody recognition systems, ie ELISA.
• Immunoassays cannot accurately differentiate between full length (PTH aa1-84) and clinically important variants (aa7-84 and others).
• There is a need for more specific assays that can accurately quantify different clinical variants.
Not all immunocapture/immunoprecipitation methods can deliver the necessary recovery and signal
Antibody capture method
Analyte Location tested Limit of detection (LOD) in matrix pg/mL
Analyte MW LOD in matrix pmol/L
SISCAPA Troponin Addona et al Clin Chem 2009, 55:1108-1117
600 20K 50
SISCAPA Thyroglobulin Hoofnagle et al Clin Chem 2008, 54:1796-1804
2600 650K 4
96 well ELISA Plate
PTH Thermo BRIMS unpublished
250 10K 30
Magnetic beads PTH Thermo BRIMS unpublished
200 10K 25
MSIA Tip PTH Lopez et al Clin Chem 2010, 56:281-290
8 10K 1
Development of a PTH assay:Top down analysis confirmed that PTH is heterogeneous and variants have clinical relevance
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.2
m/z4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200
37-8438-84
38-77
34-84
28-84
48-8445-84 34-77
37-77
m/z9325 9375 9425 9475 9525
1-84
Spectra at 3X
Rel
ativ
e In
ten
sity
Renal failure samples
We chose 4 tryptic monitoring and 2 variant specific peptides for SRM
PTH Variant MapResidue Number
N 20 40 60 80
Variant or
SRM Fragment
[1-84]
[7-84]
[34-84]
[37-84]
[38-84]
[45-84]
[28-84]
[48-84]
[34-77]
[37-77]
[38-77]
[1-13]
[7-13]
[14-20]
[28-44]
[34-44]
[73-80]
SVSEIQLMHNLGK
LMHNLGKHLNSMER LQDVHNFVALGAPLAPR
FVALGAPLAPRADVNVLTK
Variant specific
Standard curves for PTH peptide SRM assays demonstrate high precision
LQDVHNFVALGAPLAPR
SVSEIQLMHNLGK
LOD was estimated at ca 8pg/mL and LOQ was calculated to be ca 30 pg/mL.
R2 = 0.93%CV < 10
R2 = 0.98%CV < 10
Differential expression ratios of PTH peptides in renal failure vs normal Samples, ratios ranged from 4.4-12.3 LQDVHNFVALGAPLAPR (aa28-44) SVSSEIQLMHNLGK (aa1-13) HLNSMER (aa14-20)
0E+00
1E+03
2E+03
3E+03
4E+03
5E+03
6E+03
7E+03
8E+03
RenalControl
Raw
Sig
nal I
nten
sity
0E+00
1E+03
2E+03
3E+03
4E+03
5E+03
6E+03
RenalControl
Raw
Sig
nal I
nten
sity
0E+00
1E+02
2E+02
3E+02
4E+02
5E+02
6E+02
7E+02
8E+02
RenalControl
Raw
Sig
nal I
nten
sity
0E+00
1E+04
2E+04
3E+04
4E+04
RenalControl
Raw
Sig
nal I
nten
sity
0E+00
1E+03
2E+03
3E+03
4E+03
5E+03
6E+03
7E+03
8E+03
9E+03
RenalControl
Raw
Sig
nal I
nten
sity
FVALGAPLAPR (aa34-44) ADVNVLTK (aa73-80)
Ratio = 7.6 Ratio = 7.5Ratio = 12.3
Ratio = 9.2Ratio = 4.4
Summary
• An integrated workflow for quantitative, label-free proteomic analysis facilitates discovery
• Important components of a discovery platform include powerful instrumentation and software
• Results from discovery experiments can be translated into targeted assays for biomarker verification
Acknowledgements
Mary Lopez-Director
David Sarracino-
Manager, Biomarker Workflows
Bryan Krastins-Biomarker ScientistAmol Prakash-
Bioinformatic ScientistMichael Athanas-
Software Consultant
Taha Rezai
Quantitative Proteomics
Scientist
Jennifer Sutton-Manager, Biomarker Research
BRIMS TEAM
Thermo FisherScott PetermanAmy ZumwaltAndreas HuhmerBernard Delanghe
IBI, ASU Biodesign InstituteRandall NelsonDobrin NedelkovPaul OranChad Borges
Mass General Hospital, Harvard U.MingMing NingFerdinando S Buonanno Eng H Lo Mayo Clinic
Ravinder SinghDave Barnidge
•Proxeon Easy-nLC•Trap Column 100um x 5 cm PS-DVB 5um(15-20um for dirty samples) particle 300A pore• Loading flow rates 5um traps 5uL a min; 15-20uL a min for 15-20um
particle traps•Resolving column 100um x 25cm C18AQ 200A•Buffer A 5% Methanol 0.2% formic acid/water•Buffer B 90% acetonitrile 0.2% formic acid water•Thermo Nanospray Source• Instrument Tuned on angiotensin 1•Lock masses used common polysiloxane and pthalates
Two-Pass workflow LC configuration