deep and fast analysis of high resolution ms/ms datacncp.ict.ac.cn/2014/res/ppt/report/hao...

45
Deep and Fast Analysis of High Resolution MS/MS Data 迟浩 中国科学院计算技术研究所 pFind 团队 2014-11-12 CNCP-2014 ——引擎那些事儿

Upload: others

Post on 09-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

Deep and Fast Analysis of High

Resolution MS/MS Data

迟浩中国科学院计算技术研究所

pFind团队

2014-11-12

CNCP-2014

——引擎那些事儿

Page 2: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

The Achilles Heel of Proteomics

2

Scott D. Patterson March 2003 · Volume 21 Nature Biotechnology

Page 3: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

Ruedi Aebersold, 2009

3

Page 4: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

ID Rate of Low Resolution Data

4

• LTQ (Mascot + Sequest): 5366 / 15992 = 33.6

• QTOF (Mascot + Sequest): 3477 / 15309 = 22.7

Page 5: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

5

Molecular & Cellular Proteomics 2011

ID Rate of High Resolution Data

Q Exactive: 37.9%

LTQ Orbitrap Velos: 56.2%

Page 6: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

Identification Rate

• Present• What is the current rate?

• Past• Why low in the past?

• Prospect• How high in the future?

6

Page 7: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

Identification Rate

• Present• What is the current rate?

• Past• Why low in the past?

• Prospect• How high in the future?

7

Page 8: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

8

Molecular & Cellular Proteomics 2011

ID Rate of High Resolution Data

LTQ Orbitrap Velos: 56.2%

Page 10: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

Search Parameters

10

Parameters Values

Target Database Uniprot_Human + 286 Contaminants

Decoy Database MQ: Reversal + Swapping | pFind: Reversal

Digestion Trypsin, Specific, Up to 2 Missing Cleavage Sites

Fixed Modifications Carbamidomethyl (C)

Variable Modifications Acetyl (Protein N-term), Oxidation (M)

Raw Extraction MQ: MaxQuant | pFind: pXtract

Mixture Spectra Extraction MQ: “Second Peptide“ closed | pFind: pParse closed

MS1 Precursor Tolerance ±20 ppm

MS2 Fragment Tolerance ±20 ppm

Validation FDR 1% at Spectrum Level

Page 11: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

MS1 20 ppm | MS2 20 ppm

MQ = 10873 | 52.7

pFind = 12663 | 61.4

MQ ∩ pFind = 10570 | 51.2

MQ ∪ pFind = 12916 | 62.6

11

2043

pFind MQ

25310570

20631

Page 12: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

Identification Rate

• Present• Q: What is the current rate?

• A: 50% ~ 60%

• Past• Why low in the past?

• Prospect• How high in the future?

12

Page 13: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

Identification Rate

• Present• What is the current rate?

• Past• Why low in the past?

• Prospect• How high in the future?

13

Page 14: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

Method

• Use HCD data to simulate CID data

• Use HCD results as benchmarks

14

Instrument Fragmentation Mode MS1 MS2

Orbitrap Velos HCD High-High 20 ppm 20 ppm

Orbitrap XL CID High-Low 20 ppm 0.5 Da

Q-TOF CID Low-Low 0.2 Da 0.2 Da

LTQ CID Low-Low 2.0 Da 0.8 Da

Page 15: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

ID Rate: MaxQuant

15

Instrument Dissociation Mode MS1 MS2 MaxQuant

Orbitrap Velos HCD High-High 20 ppm 20 ppm 52.7 %

Orbitrap XL CID High-Low 20 ppm 0.5 Da 47.8 %

Q-TOF CID Low-Low 200 ppm 0.2 Da 37.0 %

LTQ CID Low-Low 2000 ppm 0.8 Da 24.6 %

Page 16: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

ID Rate: pFind

16

Instrument Dissociation Mode MS1 MS2 pFind

Orbitrap Velos HCD High-High 20 ppm 20 ppm 61.4%

Orbitrap XL CID High-Low 20 ppm 0.5 Da 44.2%

Q-TOF CID Low-Low 0.2 Da 0.2 Da 40.5%

LTQ CID Low-Low 2.0 Da 0.8 Da 33.3%

Page 17: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

◦ 65 + 55 distinctso few

◦ 9008 consistentso many

◦ 3600 unidentifiedso many

17

Why: 20 ppm vs 0.5 Da

3600

pFind

20 ppm | 20 ppm

pFind

20 ppm | 0.5 Da

659008

55

Page 18: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

Low Resolution Result

Page 19: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

The Same Peptide in High Resolution Mode

Page 20: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

Actually…

K Q: = 0.0364

Page 21: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

◦ 65 + 55 distinctso few

◦ 9008 consistentso many

◦ 3600 unidentifiedso many

21

Why: 20 ppm vs 0.5 Da

3600

pFind

20 ppm | 20 ppm

pFind

20 ppm | 0.5 Da

659008

55

Page 22: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

◦ 65 + 55 distinctso few

◦ 9008 consistentso many

◦ 3600 unidentifiedso many

22

Why: 20 ppm vs 0.5 Da

3600

pFind

20 ppm | 20 ppm

pFind

20 ppm | 0.5 Da

659008

55

Page 23: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

Why Unidentified?

MS2 = 20 ppm

MS2 = 0.5 Da

9008 Consistent

3600 Unidentified

decoy

log10(e-value)

Den

sit

y

(For charge 2+ only)

Page 24: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

Identification Rate

• Present• What is the current rate?

• Past• Q: Why low in the past?

• A: Low discrimination power.

• Prospect• How high in the future?

24

Page 25: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

Identification Rate

• Present• What is the current rate?

• Past• Why low in the past?

• Prospect• How high in the future?

25

MS/MS Data Protein DatabaseSearch Engine

Page 26: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

1. Different Digestion Manners

26

Full-Specific94.7%

C-term Specific4.5%

N-term Specific0.7%

Non-Specific0.1%

Page 27: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

2. Modifications

27

0% 2% 4% 6% 8% 10%

Glu->pyro-Glu[AnyN-termE]

Acetyl[AnyN-term]

Deamidated[N]

Oxidation[M]

Carboxymethyl[C]

Gln->pyro-Glu[AnyN-termQ]

Carbamidomethyl[C]

Deamidated[Q]

Percentage of the Modified Peptides

Page 28: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

Deamidated[Q]

28

Page 29: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

Deamidated[Q]

29

Page 30: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

3. Mixture Spectra

30

Exported by pParse 2.0

Page 31: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

3. Mixture Spectra

31

Automated labeled by pBuild

v3.0.7

Page 32: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

Improve the ID rate

32

Parameters Values

Target Database Uniprot_Human + 286 Contaminants

Decoy Database MQ: Reversal + Swapping | pFind: Reversal

Digestion Trypsin, Specific, Up to 2 Missing Cleavage Sites

Fixed Modifications NULL

Variable Modifications Acetyl(Protein N-term), Carbamidomethyl(C), Oxidation(M), Carboxymethyl(C), Deamidation(NQ), GlnpyroGlu(AnyN-termQ)

Raw Extraction MQ: MaxQuant | pFind: pXtract

Mixture Spectra Extraction MQ: “Second Peptide“ open | pFind: pParse open

MS1 Precursor Tolerance ±7 ppm [To differentiate 0.984 from 1.003]

MS2 Fragment Tolerance ±20 ppm

Validation FDR 1% at Spectrum Level

Page 33: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

Higher ID Rates

pFind 2.8 MaxQuant

33

2636 207

10834

2566

40

13181

6 modifications

3 modifications

Consistent

ID Rate: 76% ID Rate: 65%

Page 34: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

Search Speed

• Dell Precision T1500 (8-Core PC )

• Routine Search• pFind = 10 min, MQ = 45 min

• Open pFind Search• 4 hours

• Re-Search• pFind = 25 min, MQ = 115 min

34

Page 35: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

35

Space and Speed

Avrg.Peptides

AmplificationTime Used

pFind 2.8 MaxQuant

Full-Specific 349 1

/Semi-Specific 5,925 17

Non-Specific 26,895 77

Regular Search 625 2 10 min 45 min

Complex Modifications 2298 7 25 min (+240 min) 115 min

Uniprot-Human,Peptide length: 6 ~ 60,Digested by Trypsin

Page 36: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

pFind 3: Deep Analysis

• Identification Rate of MS/MS scans

• 17630 | 85.5%

36

Specific94.7%

C-term Specific4.5%

N-term Specific0.7%

Non-Specific0.1%

0% 2% 4% 6% 8% 10%

Glu->pyro-Glu[AnyN-termE]

Acetyl[AnyN-term]

Deamidated[N]

Oxidation[M]

Carboxymethyl[C]

Gln->pyro-Glu[AnyN-termQ]

Carbamidomethyl[C]

Deamidated[Q]

Percentage of the Modified Peptides

Page 37: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

Overlap

37

pFind 3MaxQuant

pFind 2.8

22.7%

60.1%

0.6%

0.2%

Identified MS2 Scans

0.7%

Page 38: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

38

Space and Speed

Avrg.Peptides

AmplificationTime Used

pFind 2.8 MaxQuant pFind 3

Full-Specific 349 1

/Semi-Specific 5,925 17

Non-Specific 26,895 77

Regular Search 625 2 10 min 45 min /

Complex Modifications 2298 7 25 min (+240 min) 115 min /

Non-Specific +Unimod 62,931,180 180,319 / / 17 min

Uniprot-Human,Peptide length: 6 ~ 60,Digested by Trypsin

~ 20 spectra are identified per second

Page 39: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

Unidentified Spectra?

• MS2 Scan: 2,000 ~ 20,000• 80% of total spectra

• ID Rate: 93.7%

39

Page 40: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

More Datasets

Instrument No. Spectra ID Rate Speed (spec. / sec.)

1 LTQ Orbitrap Velos 64,112 56% 86% 17

2 LTQ Orbitrap Velos 486,411 30% 61% 51

3 Q Exactive 136,560 38% 71% 16

4 Q Exactive 1,934,361 16% 70% 54

40

Page 41: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

Identification Rate

• Present• What is the current rate?

• Past• Why low in the past?

• Prospect• Q: How high in the future?

• A: 60% ~ 80%

41

Page 42: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

Discussion: Deep or Fast?

• Only Restricted Search• Fast but simple

• Only Open Search• Deep but slow

• High Resolution + New Algorithm = Deep and Fast

42

Page 43: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

Ten Years ago: 2003

43

Scott D. Patterson March 2003 · Volume 21 Nature Biotechnology

Page 44: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

Ten years later: 2014 -

44

Scott D. Patterson March 2003 · Volume 21 Nature Biotechnology

Page 45: Deep and Fast Analysis of High Resolution MS/MS Datacncp.ict.ac.cn/2014/res/ppt/report/Hao Chi.pdf · 2014. 11. 18. · • Use HCD data to simulate CID data • Use HCD results as

45

Greetings from the pFind Team @ ICT. CAS. CHINA