mass spectrometry and proteomics - lecture 5...matthias trost newcastle university...

50
Mass Spectrometry and Proteomics - Lecture 5 - Matthias Trost Newcastle University [email protected]

Upload: others

Post on 23-Jan-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

Mass Spectrometry and Proteomics- Lecture 5 -

Matthias TrostNewcastle University

[email protected]

Page 2: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

Previously

• Proteomics• Sample prep

144

Page 3: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

Lecture 5

• Quantitation techniques• Search Algorithms• Proteomics software

145

Page 4: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

146

Current limitations of MS-based Proteomics

Bantscheff et al, Anal Bioanal Chem,2007

• Cellular proteins span a wide range of expression and current mass spectrometric technologies typically sample only a fraction of all the proteins present in a sample. • Due to limited data quality, only a fraction of all identified proteins can also be reliably quantified.

Page 5: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

147

Limitations of Proteomics –concentration of proteins in plasma

Anderson & Anderson, MCP, 2002

Page 6: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

148

Quantitation techniquesLabel-free• Ion intensity• Spectral counting

Chemical isotopic labeling• ICAT• iTRAQ/TMT• mTRAQ• Formaldehyde label• Enzymatic label

Metabolic isotopic labeling• SILAC• 15N

Page 7: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

149

The three different spectral sources of quantitative information

Wilm, Proteomics, 2010

Page 8: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

150

Quantitation methods

Isotope label(SILAC, ICAT, demethyl label etc)

Fragmentation-based label(iTRAQ)

Label-free

MS

MS/MS

X Da

Page 9: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

151

Quantitation strategies

Bantscheff et al, Anal Bioanal Chem,2007

Page 10: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

152

Characteristics of quantitative MS methods

Bantscheff et al, Anal Bioanal Chem, 2007

Page 11: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

153

Label-free quantitation

• MASCOT • identification driven

peptide assignment

Peak detection (in triplicate) Hierarchical clusteringPeak detection (in triplicate)

MS/MSCondition A Condition B

Page 12: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

154

Label-free proteomics

Advantages and Disadvantages

+ Lower complexity+ Lower cost+ Primary tissue possible(+) Repetitions increase

identification rates

- High LC-reproducibility necessary

- Good clustering dependent on high mass accuracy

- Several peptides for reliable quantitation required

Stdev Cond. A 0.089 Stdev Cond. B 0.067Ratio Cond. A/Cond. B 0.49

RLEIpSPDpSpSPER

Cond. B

Cond. A

Page 13: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

155

Another label-free quantitation: Spectral counting

• The number of spectra matched to peptides from a protein is used as a surrogate measure of protein abundance.

• As the sampling of peptides in a mass spectrometer is usually depending on the peptides’ intensities, spectral counting has a reasonable statistical significance.

• Spectral counting is cheaper, easier to implement and does not require highly reproducible data.

• It requires however still thorough computational and statistical analysis.

• Modern mass specs are getting to sensitive and fast for this quantitation.

Page 14: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

156

Isobaric tag for relative and absolute quantitation (TMT or iTRAQ)

• Reacts with N-termini and other primary amines of peptides.

• Uses a reporter group for quantification that can be identified in MS/MS spectra.

• Another labeled group serves as a balancer.

https://www.thermofisher.com/

Page 15: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

157

Isobaric tag for relative and absolute quantitation (TMT or iTRAQ)

• Quantification is done in MS/MS mode (low intensity!)

• Once labeled with TMT or iTRAQ, the 4/6/8/10 individual samples are pooled for further processing and analysis.

• During subsequent MS/MS of the peptides, each isobaric tag produces a unique reporter ion that identifies which samples the peptide originated and its relative abundance.

Gingras et al, Nat Rev Mol Cell Biol, 2007

Page 16: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

158

Isobaric tag for relative and absolute quantitation (iTRAQ or TMT)

+ Up to 11 samples (11-plex) can be quantified at the same time.

+ Saves instrument time.

- Quite expensive.

- Low dynamic range.

- Can not be performed in most ion-trap instruments as they do not reach this low mass range.

- Non-changing peptides are favored to be identified.

- large mass addition to peptides

- high ratios are suppressed by co-eluting other peptides. www.thermo.com

Page 17: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

Ratio compression in TMT experiments

159

Ow, J Prot Res, 2009Ting et al, Nature Methods, 2011

Page 18: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

Reducing ratio compression by using Synchronous Precursor Selection (SPS)

160

Page 19: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

161

Formaldehyde/dimethyl label

• Samples are labeled with heavy and light formaldehyde on their primary amines (N-termini, Lys)

• relatively cheap and simple.

• can be used on virtually any sample.

• quite large mass difference between samples.

• Problematic retention time shifts in long LC runs due to Deuterium.Chen et al, Anal Chem, 2003; Boersema et al, Proteomics, 2008

Page 20: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

162

Formaldehyde/dimethyl label

Chen et al, Anal Chem, 2003

Page 21: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

163

Enzymatic isotope label• Further disadvantage:

Introduction of 18O at acidic side chains

• often incomplete incorporation of the label

Miyagi et al, Mass Spec Rev, 2006

Page 22: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

164

Stable isotope labeling with amino acids in cell culture (SILAC)

• Cells are grown with “normal” and heavy isotope amino acids.

+ The isotopically labeled peptides are chemically (almost) identical (Retention time etc)

+ The different samples are mixed at a very early step during sample preparation.

- labeled amino acids (Lys/Arg) might be metabolized to other amino acids

- Expensive for large amounts of cells.

- Not for primary tissue.

- Increases complexity of the sample.

- Some cell types do not grow well in dialysed serum.commons.wikimedia.org

Page 23: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

Neutron encoding (NeuCode) SILAC

• Makes use of the subtle mass differences caused by nuclear binding energy variation in stable isotopes (“mass defect”).

• For example, labelling with lysine with 2H8 (+8.0502 Da) and Lysine with 13C6 and 15N2 (+8.0142 Da).

• Can only be resolved with very high resolution >200,000.• In a low-resolution (<15,000) MS/MS scan, peaks are overlaying

and indistinguishable, thus both peaks add to the intensity.• Theoretically, up to 39 isotopologues of Lysine are possible.

165

Herbert et al, Nature Methods 2013Rose et al, Anal Chem, 2013

Page 24: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

Neutron encoding (NeuCode) SILAC

166Herbert et al, Nature Methods 2013

(a) Mass calculations of the 39 isotopologues for a +8-Da lysine. Shown in solid black are the isotopologues used for the experiments presented here. (b) Theoretical calculations depicting the percentage of peptides that are resolved (full width at 1% maximum peak height) when spaced 12, 18 or 36 mDa apart for resolving powers (R) of 15,000–1,000,000. (c) Top, MS1 scan collected with typical 30,000 resolving power. Center, a selected precursor with m/z at 827 collected with 30,000 resolving power (black) and the signal recorded in a high-resolution MS1 scan (480,000 resolving power).

Page 25: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

Protein Identification

• Either “de novo” (thus no database) or from genomic data.

• When genomic data is available, the software performs an in silico digestion of the whole database using the specific protease.

• The mass of the peptide and the MS/MS spectrum are compared to the theoretical mass and the spectrum.

167

Page 26: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

Search Engines• Good search engines take common rules (high peaks

after P) into account.• The engines calculates a score from the number of

matched peaks compared to peaks present in spectrum. • This score is usually linked to a probability.• Lately, search engines using spectral libraries have

emerged. They are much faster and more accurate. However, good spectra for each peptide are required and ideally acquired in different kinds of instruments.

168

Page 27: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

For large scale proteomics, identification of peptides becomes a complex matching problem

Peptide ID & matching

Page 28: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

For large scale proteomics, identification of peptides becomes a complex matching problem

Peptide ID & matching

Page 29: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

Peptide A Fragment Masses

ProteomeUniProt

Peptide B Mass Peptide B Fragment Masses

Peptide A MassDigestionin silico

Fragmentationin silico

Database

Page 30: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

Observed Mass1000 ± 0.010 Da

Corresponding MS2 data

The Database Search1. MS1 filter2. MS2 scoring3. Probabilistic analysis

m/zIn

tens

ity

Database Search

Page 31: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

Observed Mass1000 ± 0.010 Da

Peptide A Mass999.980

Peptide B Mass999.993

Peptide C Mass1000.005

Peptide D Mass1000.010

Peptide E Mass1000.025

Database Search –MS1 filter

Page 32: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

Observed Mass1000 ± 0.010 Da

Peptide A Mass999.980

Peptide B Mass999.993

Peptide C Mass1000.005

Peptide D Mass1000.010

Peptide E Mass1000.025

Database Search –MS1 filter

Page 33: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

Observed Mass1000 ± 0.010 Da

Peptide B Mass999.993

Peptide C Mass1000.005

Peptide D Mass1000.010

Observed Spectra

Database Search –theoretical MS/MS spectra

Score

9

80

1

Page 34: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

Observed Mass1000 ± 0.010 Da

Peptide C Mass1000.005 80Peptide Evidence:

Theoreticalspectra

Observedspectra Score

Database Search –scoring

Page 35: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

Search constraints• “Classic”

– Peptide/precursor mass accuracy– MS/MS/fragment mass accuracy– Fixed and variable modifications– Enzyme (specificity)– Instrument/type of ions generated

• Proposed– Retention time

177

Page 36: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

Commonly used Search Engines

• Mascot • Sequest• OMSSA • X!Tandem• Andromeda (within MaxQuant)• …

178

Page 37: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

Decoy/target strategy to determine FDR

179

Page 38: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

PEP =# hits decoy database# hits

@ a given score

Decoy/target strategy to determine FDR

probability that a match of score 100 is incorrect

~ 0

probability that the match of score 10 is incorrect~ 90%

Page 39: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

>UbiquitinMQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG

Decoy/target strategy to determine FDR

>UbiquitinMQIFVK

MQIFVKTarget Database

VFIQMKDecoy Database

Page 40: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

False-Discovery Rate• Peptide/protein identification by mass spectrometry is a

statistical analysis with false-negatives and false-positives.

• False-discovery rate (FDR) is estimated by searching the data against a combined forward and reversed database. The number of hits from the reversed database is thought equivalent with false hits in the forward database.

• Please note that the FDR is on the identification level only, not on the quantitation level.

• Commonly accepted FDRs are <1%.

182

Page 41: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

• We accept that a very small proportion of peptide identifications (usually set to 1%) will likely be false discoveries

• Hence, having multiple supporting peptides per protein is important for confident identification and quantitation

Considerations

Page 42: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

• FDR estimation is challenging using small databases or when most of the database is identified. Always use bigger databases (for example include human with bacterial database)

Considerations

Page 43: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

• Choose your PTMs wisely

• Too many PTMs lead to combinatorial explosion and long database search times

• Common chemical modifications– Deamidation (NQ)– Gln PyroGlu– Oxidation (M)– Carbamidomethylation (C)– Acetyl (N-terminus)

Considerations

Page 44: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

VFIQMKVFIQMKTLSDYNIQK

Protein AProtein BProtein C

ESTLHLVLR Protein AProtein BProtein C

EGIPPDQQRMQIFVK

The vast majority of MS identification and quantitation is performed on peptides; information on proteins is through inference

The peptide to protein relationship is a “many to many” match

Considerations

Page 45: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

VFIQMKVFIQMKTLSDYNIQK

Protein AProtein BProtein C

ESTLHLVLR Protein AProtein BProtein C

EGIPPDQQRMQIFVK

Assigning non-unique peptides:

“Occam’s Razor”Accept the simplest explanation that fits the observations

Non-unique peptides are assigned to proteins that have the most unique peptides

Considerations

Page 46: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

• Evaluate distribution of data

• Normalise data

• Calculate standard deviation to set cutoffs

Check your data: histograms

Page 47: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

• Intensities vs intensities

• Reproducibility

Check your data: scatter plots

Page 48: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

• Evaluate experimental reproducibility (0.05 is usual p-value cutoff)

• Appropriate fold change cutoff depends on standard deviation

Check your data: volcano plots

Page 49: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

Databases• UniProt databases are the standard for mouse, human

and most other organisms. • They should be ideally non-redundant.• Can/should contain splice variants.• Database should not be too small (problem for bacteria)

as FDR calculation might be wrong. • A common set of contaminants (keratin, BSA, milk

proteins…) should be added to the searched database.

191

Page 50: Mass Spectrometry and Proteomics - Lecture 5...Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously • Proteomics • Sample prep 144 Lecture 5 • Quantitation

Software for MS ID and Quant

SRM/Targeted

• MaxQuant• Trans Proteomic

Pipeline (TPP)• Proteome

Discoverer• PEAKS• Scaffold

• Skyline

Software Platforms

• Mascot• Sequest• OMSSA• Morpheus

de novosequencing

• PEAKS

TMT quantitation

• COMPASS • MaxQuant• Proteome

Discoverer

ID only