Download - Introduction to Proteomics & Bottom-up Proteomics...Protein isolation (PAGE, chromatography, etc) - optional Reduction and alkylation Enzymatic digestion or chemical cleavage Sample

Introduction to Proteomics &

Bottom-up Proteomics

W. Andy Tao

Purdue University

[email protected]

WWhat is Proteomics?

•� Analysis of the entire PROTEin complement

expressed by a genOME of a cell or tissue type (Mac

Wilkins)

•� Proteomics focuses state-related expression of proteins

in biological samples

•� Proteomics is systematic analysis and documentation

•� Proteomics identifies and quantifies proteins, as well

as determines their localization, modifications,

interactions, activities, and ultimately, their functions

Used for MS Short Course at Tsinghua by R. Graham Cooks, Hao Chen, Zheng Ouyang, Andy Tao, Yu Xia and Lingjun Li

HHistorical Perspective 1. 1970’s: 2D gels & separation of hundreds of proteins at once

2. Mid 90’s: huge expansion •� easy-to-use high performance mass spectrometers

–� MALDI, ESI

–� high resolution, sensitivity, accuracy, MS/MS –� protein and peptide measurement and peptide fragmentation

•� genome projects –� human, bacterial, plant, other animals –� huge searchable databases (bioinformatics)

Considerable interest once transcriptome was found to be poor predictor of proteome.

3. 21st century: consolidation •� Post-translational modifications

•� Relative and absolute quantitation •� Targeted analyses •� Biological applications

It’s all about separation and identification!

��Separation �� Multi-dimensional �� Gel-based or gel-free

��Identification �� TTop-down (lectured by Dr. Yu Xia)

•� Analytes are proteins •� ECD for the fragmentation of proteins •� Almost exclusively in FT-ICR

�� Bottom-up •� Analytes are peptides (digested from proteins) •� CID is the most common method for fragmenting peptides •� In any mass spectrometer

�� Middle-down •� Large peptides (5k-20k Da) •� Its approach is similar to top down


•�Why bottom up to analyze peptides? �� They are smaller and therefore easier to ionize for

MS analysis

�� Fragment ion spectra are much easier to interpret

�� In general it is possible to get much greater sequence coverage using a mixture of peptides that are analyzed individually

�� Using peptide separation techniques it is possible to reduce a complex protein into simpler fractions for analysis

•� Enzymatic digestion of proteins �� Sequence-specific proteases – complete digestion

�� Non-specific proteases – partial digestion

•� Chemical treatment of proteins

•� TThe basic scheme for peptide generation and analysis:

Protein isolation (PAGE, chromatography, etc) - optional

Reduction and alkylation

Enzymatic digestion or chemical cleavage

Sample cleanup and/or peptide separation (RP-HPLC)

Mass spec analysis


PProtein Identification by Mass Spectrometry

��Mass Fingerprint (MS)

��Tandem Mass Spectrometry

•�What is it? �� A method for identifying an unknown protein based on

measurement of the masses of peptides generated from it

�� Requires a high mass accuracy mass spectrometer, a sequenced genome for the organism the sample is derived from, and a database search algorithm

•�How does it work? �� Just like every person has a unique fingerprint pattern

that can be used to identify them, every protein generates a unique set of peptides after site-specific proteolysis

�� The collective masses of these peptides then is also unique to the parent protein and can likewise be used to identify it


PPMF by MALDI-TOF MS:

gel separated proteins

extract protein, digest with enzyme such as trypsin

NH2

NH2

NH2

-COOH R K -COOH

-COOH K tryptic peptides

MALDI-TOF analysis

search masses against database

m/z

Mass Mapping Peptide Mass Fingerprint

List of peptide masses from MS scan

Sequence Database

Identified Protein

m/z


EExample

2978.4567 2646.2992 2186.1678 1981.8621 1131.4384 830.4519 780.4978 748.3698 742.4497 646.3228 ….

Step 1:

EExample

In silico digestion w/ trypsin

>sp|P02666|CASB_BOVIN Beta-casein OS=Bos taurus MKVLILACLVALALARELEELNVPGEIVESLSSSEESITRINKKIEKFQSEEQQQTEDEL QDKIHPFAQTQSLVYPFPGPIPNSLPQNIPPLTQTPVVVPPFLQPEVMGVSKVKEAMAPK HKEMPFPKYPVEPFTESQSLTLTDVENLHLPLPLLQSWMHQPHQPLPPTVMFPPQSVLSL SQSKVLPVPQKAVPYPQRDMPIQAFLLYQEPVLGPVRGPFPIIV

MK VLILACLVALALAR ELEELNVPGEIVESLSSSEESITR INKKIEK FQSEEQQQTEDEL QDK IHPFAQTQSLVYPFPGPIPNSLPQNIPPLTQTPVVVPPFLQPEVMGVSKVK EAMAPK HK EMPFPK YPVEPFTESQSLTLTDVENLHLPLPLLQSWMHQPHQPLPPTVMFPPQSVLSL SQSK VLPVPQK AVPYPQR DMPIQAFLLYQEPVLGPVR GPFPIIV

2646.2992 2186.1678 1981.8621 830.4519 780.4978 748.3698 742.4497 646.3228

2978.4567 2646.2992 2186.1678 1981.8621 1131.4384 830.4519 780.4978 748.3698 742.4497 646.3228 ….

Match!!

Step 2:

Step 3:


Online database search

Mascot

matrixscience.com

Online database search

MS-FIT

prospector.ucsf.edu


PProtein Identification by Mass Spectrometry

��Mass Fingerprint

��Tandem Mass Spectrometry

•� There are numerous methods out there for inducing peptide fragmentation

•� Common methods for peptide fragmentation �� Post-source decay (PSD) – only in MALDI

�� Collision-induced Dissociation (CID)

�� Electron Capture Dissociation (ECD)

�� Electron Transfer Dissociation (ETD)

•� CID is by far the most common and we will focus only on it, and ECD and ETD will be discussed by Dr. Xia


CCID Data Acquisition

PPeptide CID Fragmentation


FFragmenting A Peptide

SSequence and Tandem MS Spectrum


FFragment Ions on A Tandem MS Spectrum

SStrategy for Protein Sequencing by Database Searching

theoretical theoretical





m/z

%

Correlative database sequence search

EDACLGAJK

Identified protein

Database

Theoretical digestion (in silico); Theoretical fragmentation (in silico)


SStrategy for Protein Sequencing by Database Searching

PParameters for Database Searching

Sequence Databases

Protein sequence databases:

NCBI EMBL Swissprot IPI

UniPro

Genomic and Expressed Sequence Tag (EST) databases:

Human genome (draft completed in Oct 2004): ~25,000 Yeast genome (completed in April 1996): ~6,000 Mouse genome (completed in Dec 2004): ~25,000 Rice genome (draft completed in April 2002): >45,000



Tandem MS

Mass accuracy of precursor ion

http://las.perkinelmer.com/Content/RelatedMaterials/007069_01.pdf


Tandem MS

Fragmentation pattern of Tandem MS

Good Fair

Bad Terrible



Search Algorithm

Search engines:

Sequest (Thermo) Mascot (Matrix) X!Tandem (free) SpectrumMill (Agilent)

Shadforth et al, Proteomics, 2005, 5, 4082-4095 (review)

Eng et al, JASMS, 5, 976 (Sequest)

Perkins et al Electrophoresis. 20:3551 (Mascot)

Kapp et al, Proteomics, 5, 3475 (comparison)


Search Algorithm

SequestTM (patented the technique to use tandem MS and database searching for sequencing)

ni: the number of matched ions im: abundances of matched ions �: matched consecutive ions �: immonium ions associated with amino acid residues nt: total number of predicted sequence ions

Scoring

Cross-correlation (Xcorr)

x(t): signals from the reconstructed spectrum based on amino acid sequences

y(t): signals from the reconstructed experimental spectrum �: displacement value between the two signals

Final value Cn = C�=0 – C-75<�<75, after normalization

�Cn: difference between top two hits.

Eng et al, JASMS, 5, 976.



Search Algorithm

MascotTM

Scoring: MOWSE program & probability –based scoring

MOWSE: MMOlecular WWeight SSEarch. Bleasby Pappin DJC, Hojrup P, and Bleasby AJ (1993) Rapid identification of proteins by peptide-

mass fingerprinting. Curr. Biol. 3:327-332

Scoring based on peptide frequency distribution from non redundant Database.

��Takes into account relative abundance of peptides in the database when calculating scores.

��Protein size is compensated for.

Mascot score SS = 10Log(P), where P is the probability that the observed match is a random event

Probability-based scoring ��The probability that the observed match between experimental data and a protein sequence is a random event is approximately calculated for each protein in the sequence database. ��Probability model details not published.

CComparison Between MS/MS Search Algorithms

Heuristic Algorithm

Sequest Spectrum Mill X! Tandem

Probabilistic Algorithm

MASCOT PeptideProphet (rescoring algorithm)

Kapp et al, Proteomics, 5, 3475.


IIdentification of Post-translational Modifications

Static �� Simply change mass of any residue (or N- or C-terminus). �� Adds no additional computer processing time. �� Preprocessed or pre-calculated peptide masses no longer valid

Variable/Differential �� Specified residues need to be considered as un-modified or modified in

all combinations. �� Search complexity increases (a lot). �� Search time increases dramatically. �� Preprocessed or pre-calculated peptide masses no longer directly valid

SSeparation

Why do we need separations?

�� Concentrate

�� Multidimensional selectivity

�� Eliminate interfering substances

�� To be compatible with MS instruments

(time frame; ion suppression; trapping capacity)


SSeparation Methods Coupling to MS

Separation Interface MS

2D PAGE (off line) IEF (mainly off line)

LC Electrophoresis

Ion Trap Quadrupole

TOF FTICR Sector

22D PAGE-MS

2D-PAGE still is a powerful separation technique but has several disadvantages:

•� Restricted to proteins < 106 and > 104 Da MW •� Cannot detect proteins expressed at low levels •� Typically limited to 600~800 separate spots •� Gel to gel reproducibility is poor •� Quantitation is poor, ± 50% or worse •� Dynamic range is limited, < 10X •� Analysis is not directly coupled to separation •� Analysis of membrane proteins is poor •� Time-consuming process

LLimitations of 2D PAGE

2D-PAGE still is a powerful separation technique but has several disadvantages:

•� Restricted to proteins < 106 and > 104 Da MW •� Cannot detect proteins expressed at low levels •� Typically limited to 600~800 separate spots •� Gel to gel reproducibility is poor •� Quantitation is poor, ± 50% or worse •� Dynamic range is limited, < 10X •� Analysis is not directly coupled to separation •� Analysis of membrane proteins is poor •� Time-consuming process

LLimitations of 2D PAGE

LLC Coupled to ESI

Reverse-phase LC

Decrease column size, increase sensitivity

2.1 cm

1.0 cm

4.6 mm

1.0 mm

50 �m

Sensitivity

21 fold

441 fold

----

4.4 fold

176,400 fold

Size

LLC Coupled to ESI

Typical system (micro-column) for proteomics research:

Diameter: 15-100 μm x 10-20 cm

Packing material: 3-5 μm C18 silica

Flow rate: 100-300 nL/min

Spray needle: <30 μm

Mobile phase: Solvent A- 0.1% Formic acid (or 0.5% acetic acid)

B-acetonitrile with 0.1% formic acid (or 0.5% acetic acid)

Injection system: typically 1-10μl (direct injection or autosampler injection)

Gradient: most peptides elute at B 10-40%

550-100 μm fused silica capillary packed with C18

Packed C18 Tip for μLC-MS

IIon chromatogram

MS @ 27.40 min

MS2 @ 27.41 min

MS2 @ 27.41 min

MS2 @ 27.42 min

MS2 @ 27.43 min

LC-MS/MS Data-dependant Acquisition

�� Multiple peptide per protein redundancy (avg. 50)

�� Specific precursors selected repeatedly (in spite of dynamic exclusion)

�� Only a fraction of sequencing attempts is successful

�� Only a fraction of successful sequencing attempts identify differentially regulated proteins

�� There is a conflict between sensitivity (small column) and sample capacity (larger column).

FFactors Limiting Efficiency of LC-MS/MS Experiments

�� Average peak width is 10-30s for a single peptide on

RPLC. For 60 min gradient, average No. of peptides

at any moment is 200 peptides for a 500 protein

mixture.

�� Proteins in cells range in the concentration of over 7

orders of magnitude. Low abundant peptides are

usually overwhelmed by high abundant ones.

�� Limited MS data acquisition speed; Ion surrpresion;

Trapping capacity.

FFactors Limiting Efficiency of LC-MS/MS Experiments

OOrthogonal SSeparation Methods

1st dimension

1D PAGE IEF Affinity chromatography Electrophoresis Ion exchange Size exclusion Specific targeting (e.g., chemical derivation)

RPLC ???

2nd dimension X

TTwo columns in one

IEX resin RP-resin

�� Principle: Two different column packing materials in same capillary �� 2D chromatography in a single operation �� Increase sample capacity?

MudPIT: Multi-dimensional Protein Identification Technology

Washburn et al Nat Biotechnol 2001

TTwo-dimensional Orthogonal HPLC Separation

RP SCX

Buffer A B C D

Mass Spectrometer

MudPIT Cycle 1) load sample 2) wash 3) salt step (stepwise increase conc.) 4) wash 5) RP gradient 6) re-equilibration 7) go to step (3)

Sample

Buffers A: 5% Acetonitrile/0.02%HFBA B: 80% Acetonitrile/0.02%HFBA

C: 250 mM NH4Ac/0.02%HFBA D: 500 mM NH4Ac/0.02%HFBA

Bottom-up Proteomics II

CCovered topics

��Quantitative proteomics

�� Post-translational modifications

�� Protein complex

WWhy quantify proteins?

What do you want to learn from a

quantitative proteomics experiment?

WWhy quantify proteins?

DNA

mRNA

Functional Protein

Post-translational modifications

Degradation

The correlation between mRNA levels and protein expression levels is low (correl coef < 0.4 overall)

Biological functions

X

�� MALDI: ionization hot spots �� Different ionization efficiency for different peptides �� Variable ion transmission �� Competition for charges �� Point of precursor ion selection in chromatographic peak

There is a poor correlation between the amount of a peptide present and the MS and MS/MS signal intensities

HHow to quantify proteins?

AAccurate Quantitation Using Isotope Dilution

Sample 1 Sample 2

(Reference)

Incorporate

Stable heavy (H)

Isotope

Incorporate

Stable regular (L)

Isotope

Analyze by Mass Spectrometer

Combine Samples

•� H/L analytes are chemically identical � identical specific signal in MS

•� Ratio of H/L signals indicates ratio of analytes

Metabolic stable isotope labeling

Isotope tagging by chemical reaction

Digest

Label

Stable isotope incorporation via enzyme reaction

Inte

nsity

Inte

nsity

Inte

nsity

m/z m/z m/z

oot ppepopto eliinggngg

Digest

y

Digest

Stable Isotope Labeling Strategies

Metabolic stable isotope labeling

Inte

nsity

m/z

oot ppepopto eliinggngg

Digest Prototypical applications:

•�Zhou et al, RCMS (2002)

•�Mann et al, Mol Cell Prot (2003)

•�Veenstra eta l JASMS (2000)


��15N-enriched media (ammonium sulfate-15N for

yeast culture)

��Amino acid (Lys-13C, Arg-13C) for mammalian

cell lines

SStable Isotope Tagging by Metabolic Labeling

Strengths �� No chemical reactions �� Potentially all peptides

labeled �� Simple labeling

protocols �� Quantitative labeling -

no side reactions

Weaknesses �� Compatible with selected

species, samples only �� No inherent sample

enrichment �� Labeling potentially

perturbs biological system �� Label potentially

metabolized �� Mass difference between

sequence identical peptides can vary

SStable Isotope Tagging by Metabolic Labeling

SILAC: Stable isotopic labeling with amino acids in cell culture

Lys-13C

Arg-13C

Stable isotope incorporation via enzyme reaction

Inte

nsity

m/z

y

Digest

Prototypical applications:

•�Stewart et al, Rapid Comm in Mass Spec

(2001)

•�Reynolds KJ et al, J Prot Res (2002)

•�Schevchenko A et al, Rapid Comm in Mass

Spec (1997)

•�Schnolzer M Electrophoresis (1996)


SStable Isotope Incorporation via Enzyme Reaction

Strengths �� General �� Compatible with any

source of protein �� Constant mass shift

Weaknesses �� Minimal mass difference �� Side reactions

Enzyme in H2O

or H218O

Isotope tagging by chemical reaction

Digest

Label

Inte

nsity

m/z

Prototypical application: •� Isotope coded affinity tags

(Gygi et al, 1999; Zhou et al, 2002)


IIsotope Coded Affinity Tags (ICAT)

Heavy reagent: d8-ICAT (X =deuterium)

Light reagent: d0-ICAT (X =hydrogen) ICAT Reagents:

Affinity group Labeled linker Reactive group

Gygi et al Nat Biotech, 1999

SStable Isotope Tagging by Chemical Reaction

Strengths �� Compatible with any

protein source �� Selective tagging

reduces sample complexity

�� Different specificities can be designed into reagent

�� Constant mass difference

�� Potentially multiplexed

Weaknesses �� Cys-specific reagents miss

cysteine-free proteins �� Chemical reactions required �� Each specificity requires

different reagent �� Tag might interfere with

MS/MS �� Potential for side reactions,

incomplete reactions �� Potential chromatographic

isotope effect

FFractionations currently used Sub-cellular fractionation

Immunoprecipitation Ion-exchange

Reversed-phase HPLC Isoelectric focusing

combine and

proteolyze

Fractionations

& affinity

enrichment

labeled

cysteines

550 560 570 580 m/z

100

200 400 600 800 m/z

0

100 NH2-EACDPLR-COOH

light heavy

mixture 2 (heavy)

mixture 1 (light)

MS

analysis

quantification

Identification

(MS/MS)

Human myeloid Leukemia (HL-60) cells ��well characterized in vitro model for cell differentiation

+/- 12-phorbol-13-myristate acetate (PMA) ��induces morphological changes ��cells become more adherent

expect to see changes in cell-surface protein profile

Han, et al (2001) Nat Biotech 19:946-951

214 nm

280 nm

pressure

gradient

• HL-60 microsomal fraction • biotinylated Cys residues • combined samples (d0 & d8) • tryptic digest

* ICAT labeled peptides � MS2 (12 peptides total) Circled MS1 peaks shown on next slide

Extensive fractionation: 1.� Cation exchange (SCX) 2.� Affinity chrom (avidin) 3.� Capillary reverse phase

peptide sequencing by MS/MS � • CD45 identified - transmembrane Tyr phosphatase • ATC2 identified - calcium pump

calculate ratio of light:heavy from MS1: CD45 = 1:0.7 and ATC2 = 1:1.2

LC retention time differs by 4 s

d0:d8 ratio = 1:0.77 +/- 0.05

unchanged abundance:

ribosomal proteins, cytoskeletal proteins, metabolic enzymes, cell-surface receptors, channel proteins

changed abundance:

membrane associated signal transduction proteins ex. farnesyl-

diphosphate farnesyl transferase (20-fold reduction)

• n range: 2-36

• d0:d8 range: 0.05-11.45

down-regulated w/ PMA treatment

•

•

•

•

up-regulated w/ PMA treatment

491 proteins identified from microsomal fraction of HL-60 cells

total analysis time = 50 h

100 min

one SCX fraction

1025 proteins

need 2 peptides from same protein

to confirm

CConsequence: Identification of many un-interesting proteins

Number of proteins identified

Quantitative Analysis of Androgen-regulated microsomal proteins from LNCaP prostate epithelia

Up to ~90% of identified proteins show un-changed abundance

Log

d0/d

8

mixture 2 (heavy) mixture 1 (light)

300 600 900 1200 m/z

m/z

Isotope tag

Combine and digestion

Affinity purification

Fractionation on MALDI sample plate for MS

Identify differential expressed peptides for MS/MS

ignore peptides with unchanged abundance

QQuantitative Analysis Based on Tandem Mass Spectrometry

Ross et al MCP, 3(12), 1154

iTRAQTM (Applied Biosystems):

QQuantitative Analysis Based on Tandem Mass Spectrometry

Ross et al MCP, 3(12), 1154

TPHPALTEAK + 114/115/116/117 (1:1:1:1)

MS

MS/MS

LLabel-free Quantitation

1. Peak detection;

2. Peak matching;

3. Peak normalization;

4. Area/ height

measurement;

5. Statistical evaluation.

1. # of spectra correlates

with protein abundance;

2. Normalization;

3. Statistical evaluation.

Protein Post-Translational Modifications (PTMs)

Histone modification:

�� Highly dynamic

�� Low abundance

�� Low ionization efficiency

�� Poor fragmentation Protein

Protein

P

kinase phosphatase

Case Studies with Phosphoproteomics

A typical workflow for phosphoproteomics

Digestion Enrichment

MS analysis

P P

Identification

Quantitation

P

P

Fractionation Labeling

Sampling

Separation

Annotation

Enrichment of phosphopeptides

�� Metal oxide: TiO2; ZrO2

�� Immobilized metal ion affinity chromatography (IMAC):

Fe(III); Ga(III)

�� PolyMAC: polymer-based metal ion affinity capture

Ti

Ti

Ti P

Solid-phase beads PolyMAC

Mixture of peptides

Identification of Post-translational Modifications

Static

�� Simply change mass of any residue (or N- or C-terminus).

�� Adds no additional computer processing time.

�� Preprocessed or pre-calculated peptide masses no longer valid

Variable/Differential

�� Specified residues need to be considered as un-modified or modified in all

combinations.

�� Search complexity increases (a lot).

�� Search time increases dramatically.

�� Preprocessed or pre-calculated peptide masses no longer directly valid

Identification of protein-protein interactions

Immunopurified

sample

Proteolysis

and Separation

μLC-MS

&

MS/MS

Immunopurified

control

Proteolysis

and Separation

μLC-MS

&

MS/MS

control

sample

rol

l

72 120 15

Advantage: One of the most sensitive and univeral methods to identify

interacting partners

Disadvantage: Difficult to remove contaminants

Download - Introduction to Proteomics & Bottom-up Proteomics...Protein isolation (PAGE, chromatography, etc) - optional Reduction and alkylation Enzymatic digestion or chemical cleavage Sample

Top Related