mass spectrometry and proteomics - lecture 5...matthias trost newcastle university...

Mass Spectrometry and Proteomics- Lecture 5 -

Matthias TrostNewcastle University

[email protected]

Previously

• Proteomics• Sample prep

144

Lecture 5

• Quantitation techniques• Search Algorithms• Proteomics software

145

146

Current limitations of MS-based Proteomics

Bantscheff et al, Anal Bioanal Chem,2007

• Cellular proteins span a wide range of expression and current mass spectrometric technologies typically sample only a fraction of all the proteins present in a sample. • Due to limited data quality, only a fraction of all identified proteins can also be reliably quantified.

147

Limitations of Proteomics –concentration of proteins in plasma

Anderson & Anderson, MCP, 2002

148

Quantitation techniquesLabel-free• Ion intensity• Spectral counting

Chemical isotopic labeling• ICAT• iTRAQ/TMT• mTRAQ• Formaldehyde label• Enzymatic label

Metabolic isotopic labeling• SILAC• 15N

149

The three different spectral sources of quantitative information

Wilm, Proteomics, 2010

150

Quantitation methods

Isotope label(SILAC, ICAT, demethyl label etc)

Fragmentation-based label(iTRAQ)

Label-free

MS

MS/MS

X Da

151

Quantitation strategies

Bantscheff et al, Anal Bioanal Chem,2007

152

Characteristics of quantitative MS methods

Bantscheff et al, Anal Bioanal Chem, 2007

153

Label-free quantitation

• MASCOT • identification driven

peptide assignment

Peak detection (in triplicate) Hierarchical clusteringPeak detection (in triplicate)

MS/MSCondition A Condition B

154

Label-free proteomics

Advantages and Disadvantages

+ Lower complexity+ Lower cost+ Primary tissue possible(+) Repetitions increase

identification rates

- High LC-reproducibility necessary

- Good clustering dependent on high mass accuracy

- Several peptides for reliable quantitation required

Stdev Cond. A 0.089 Stdev Cond. B 0.067Ratio Cond. A/Cond. B 0.49

RLEIpSPDpSpSPER

Cond. B

Cond. A

155

Another label-free quantitation: Spectral counting

• The number of spectra matched to peptides from a protein is used as a surrogate measure of protein abundance.

• As the sampling of peptides in a mass spectrometer is usually depending on the peptides’ intensities, spectral counting has a reasonable statistical significance.

• Spectral counting is cheaper, easier to implement and does not require highly reproducible data.

• It requires however still thorough computational and statistical analysis.

• Modern mass specs are getting to sensitive and fast for this quantitation.

156

Isobaric tag for relative and absolute quantitation (TMT or iTRAQ)

• Reacts with N-termini and other primary amines of peptides.

• Uses a reporter group for quantification that can be identified in MS/MS spectra.

• Another labeled group serves as a balancer.

https://www.thermofisher.com/

157

Isobaric tag for relative and absolute quantitation (TMT or iTRAQ)

• Quantification is done in MS/MS mode (low intensity!)

• Once labeled with TMT or iTRAQ, the 4/6/8/10 individual samples are pooled for further processing and analysis.

• During subsequent MS/MS of the peptides, each isobaric tag produces a unique reporter ion that identifies which samples the peptide originated and its relative abundance.

Gingras et al, Nat Rev Mol Cell Biol, 2007

158

Isobaric tag for relative and absolute quantitation (iTRAQ or TMT)

+ Up to 11 samples (11-plex) can be quantified at the same time.

+ Saves instrument time.

- Quite expensive.

- Low dynamic range.

- Can not be performed in most ion-trap instruments as they do not reach this low mass range.

- Non-changing peptides are favored to be identified.

- large mass addition to peptides

- high ratios are suppressed by co-eluting other peptides. www.thermo.com

Ratio compression in TMT experiments

159

Ow, J Prot Res, 2009Ting et al, Nature Methods, 2011

Reducing ratio compression by using Synchronous Precursor Selection (SPS)

160

161

Formaldehyde/dimethyl label

• Samples are labeled with heavy and light formaldehyde on their primary amines (N-termini, Lys)

• relatively cheap and simple.

• can be used on virtually any sample.

• quite large mass difference between samples.

• Problematic retention time shifts in long LC runs due to Deuterium.Chen et al, Anal Chem, 2003; Boersema et al, Proteomics, 2008

162

Formaldehyde/dimethyl label

Chen et al, Anal Chem, 2003

163

Enzymatic isotope label• Further disadvantage:

Introduction of 18O at acidic side chains

• often incomplete incorporation of the label

Miyagi et al, Mass Spec Rev, 2006

164

Stable isotope labeling with amino acids in cell culture (SILAC)

• Cells are grown with “normal” and heavy isotope amino acids.

+ The isotopically labeled peptides are chemically (almost) identical (Retention time etc)

+ The different samples are mixed at a very early step during sample preparation.

- labeled amino acids (Lys/Arg) might be metabolized to other amino acids

- Expensive for large amounts of cells.

- Not for primary tissue.

- Increases complexity of the sample.

- Some cell types do not grow well in dialysed serum.commons.wikimedia.org

Neutron encoding (NeuCode) SILAC

• Makes use of the subtle mass differences caused by nuclear binding energy variation in stable isotopes (“mass defect”).

• For example, labelling with lysine with 2H8 (+8.0502 Da) and Lysine with 13C6 and 15N2 (+8.0142 Da).

• Can only be resolved with very high resolution >200,000.• In a low-resolution (<15,000) MS/MS scan, peaks are overlaying

and indistinguishable, thus both peaks add to the intensity.• Theoretically, up to 39 isotopologues of Lysine are possible.

165

Herbert et al, Nature Methods 2013Rose et al, Anal Chem, 2013

Neutron encoding (NeuCode) SILAC

166Herbert et al, Nature Methods 2013

(a) Mass calculations of the 39 isotopologues for a +8-Da lysine. Shown in solid black are the isotopologues used for the experiments presented here. (b) Theoretical calculations depicting the percentage of peptides that are resolved (full width at 1% maximum peak height) when spaced 12, 18 or 36 mDa apart for resolving powers (R) of 15,000–1,000,000. (c) Top, MS1 scan collected with typical 30,000 resolving power. Center, a selected precursor with m/z at 827 collected with 30,000 resolving power (black) and the signal recorded in a high-resolution MS1 scan (480,000 resolving power).

Protein Identification

• Either “de novo” (thus no database) or from genomic data.

• When genomic data is available, the software performs an in silico digestion of the whole database using the specific protease.

• The mass of the peptide and the MS/MS spectrum are compared to the theoretical mass and the spectrum.

167

Search Engines• Good search engines take common rules (high peaks

after P) into account.• The engines calculates a score from the number of

matched peaks compared to peaks present in spectrum. • This score is usually linked to a probability.• Lately, search engines using spectral libraries have

emerged. They are much faster and more accurate. However, good spectra for each peptide are required and ideally acquired in different kinds of instruments.

168

For large scale proteomics, identification of peptides becomes a complex matching problem

Peptide ID & matching

Peptide A Fragment Masses

ProteomeUniProt

Peptide B Mass Peptide B Fragment Masses

Peptide A MassDigestionin silico

Fragmentationin silico

Database

Observed Mass1000 ± 0.010 Da

Corresponding MS2 data

The Database Search1. MS1 filter2. MS2 scoring3. Probabilistic analysis

m/zIn

tens

ity

Database Search


Peptide A Mass999.980

Peptide B Mass999.993

Peptide C Mass1000.005

Peptide D Mass1000.010

Peptide E Mass1000.025

Database Search –MS1 filter


Peptide B Mass999.993

Peptide C Mass1000.005

Peptide D Mass1000.010

Observed Spectra

Database Search –theoretical MS/MS spectra

Score

9

80

1


Peptide C Mass1000.005 80Peptide Evidence:

Theoreticalspectra

Observedspectra Score

Database Search –scoring

Search constraints• “Classic”

– Peptide/precursor mass accuracy– MS/MS/fragment mass accuracy– Fixed and variable modifications– Enzyme (specificity)– Instrument/type of ions generated

• Proposed– Retention time

177

Commonly used Search Engines

• Mascot • Sequest• OMSSA • X!Tandem• Andromeda (within MaxQuant)• …

178

Decoy/target strategy to determine FDR

179

PEP =# hits decoy database# hits

@ a given score


probability that a match of score 100 is incorrect

~ 0

probability that the match of score 10 is incorrect~ 90%

>UbiquitinMQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG


>UbiquitinMQIFVK

MQIFVKTarget Database

VFIQMKDecoy Database

False-Discovery Rate• Peptide/protein identification by mass spectrometry is a

statistical analysis with false-negatives and false-positives.

• False-discovery rate (FDR) is estimated by searching the data against a combined forward and reversed database. The number of hits from the reversed database is thought equivalent with false hits in the forward database.

• Please note that the FDR is on the identification level only, not on the quantitation level.

• Commonly accepted FDRs are <1%.

182

• We accept that a very small proportion of peptide identifications (usually set to 1%) will likely be false discoveries

• Hence, having multiple supporting peptides per protein is important for confident identification and quantitation

Considerations

• FDR estimation is challenging using small databases or when most of the database is identified. Always use bigger databases (for example include human with bacterial database)

Considerations

• Choose your PTMs wisely

• Too many PTMs lead to combinatorial explosion and long database search times

• Common chemical modifications– Deamidation (NQ)– Gln PyroGlu– Oxidation (M)– Carbamidomethylation (C)– Acetyl (N-terminus)

Considerations

VFIQMKVFIQMKTLSDYNIQK

Protein AProtein BProtein C

ESTLHLVLR Protein AProtein BProtein C

EGIPPDQQRMQIFVK

The vast majority of MS identification and quantitation is performed on peptides; information on proteins is through inference

The peptide to protein relationship is a “many to many” match

Considerations

VFIQMKVFIQMKTLSDYNIQK

Protein AProtein BProtein C

ESTLHLVLR Protein AProtein BProtein C

EGIPPDQQRMQIFVK

Assigning non-unique peptides:

“Occam’s Razor”Accept the simplest explanation that fits the observations

Non-unique peptides are assigned to proteins that have the most unique peptides

Considerations

• Evaluate distribution of data

• Normalise data

• Calculate standard deviation to set cutoffs

Check your data: histograms

• Intensities vs intensities

• Reproducibility

Check your data: scatter plots

• Evaluate experimental reproducibility (0.05 is usual p-value cutoff)

• Appropriate fold change cutoff depends on standard deviation

Check your data: volcano plots

Databases• UniProt databases are the standard for mouse, human

and most other organisms. • They should be ideally non-redundant.• Can/should contain splice variants.• Database should not be too small (problem for bacteria)

as FDR calculation might be wrong. • A common set of contaminants (keratin, BSA, milk

proteins…) should be added to the searched database.

191

Software for MS ID and Quant

SRM/Targeted

• MaxQuant• Trans Proteomic

Pipeline (TPP)• Proteome

Discoverer• PEAKS• Scaffold

• Skyline

Software Platforms

• Mascot• Sequest• OMSSA• Morpheus

de novosequencing

• PEAKS

TMT quantitation

• COMPASS • MaxQuant• Proteome

Discoverer

ID only

mass spectrometry and proteomics - lecture 5...matthias trost newcastle university...

Documents