eukaryotic secretome prediction and knowledge-base development

39
1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics & Bioinformatics. Las Vegas, July 2 - 4, 2012

Upload: duane

Post on 23-Feb-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Eukaryotic Secretome Prediction and Knowledge-Base Development. Xiang-Jia “Jack” Min Ph.D., Assistant Professor. 2 nd International Conferences on Proteomics & Bioinformatics. Las Vegas, July 2 - 4, 2012 . DNA. RNA. protein. phenotype. Genome. Transcription. mRNA - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Eukaryotic Secretome Prediction and Knowledge-Base Development

1

Eukaryotic Secretome Prediction and Knowledge-Base Development

Xiang-Jia “Jack” Min

Ph.D., Assistant Professor

2nd International Conferences on Proteomics & Bioinformatics. Las Vegas, July 2 - 4, 2012

Page 2: Eukaryotic Secretome Prediction and Knowledge-Base Development

2

DNA RNA phenotypeprotein

Page 3: Eukaryotic Secretome Prediction and Knowledge-Base Development

3

Genome

Transcriptome

Proteome

Secretome

mRNA (protein-coding DNA

sequences)

Protein sequences

Proteins with secretory signal peptide

Transcription

Translation

Secretion

Page 4: Eukaryotic Secretome Prediction and Knowledge-Base Development

4

Günter Blobel

Page 5: Eukaryotic Secretome Prediction and Knowledge-Base Development

5

Page 6: Eukaryotic Secretome Prediction and Knowledge-Base Development

6

Page 7: Eukaryotic Secretome Prediction and Knowledge-Base Development

7

Page 8: Eukaryotic Secretome Prediction and Knowledge-Base Development

8

Biomaterials Small molecules

Fungi

secreted enzymes

YeastsMouldsMushrooms

Biomaterials Bio-fuelsEnzymes

Page 9: Eukaryotic Secretome Prediction and Knowledge-Base Development

9

How to identify secreted proteins?

Genome

Transcriptome

Proteome

Secretome

Transcription

Translation

Secretion

(1) Direct identification using proteomics methods (Tsang et al. 2009)

(2) Computational prediction from predicted proteome

(3) EST data mining

Page 10: Eukaryotic Secretome Prediction and Knowledge-Base Development

10

Secreted Proteins

• Classical secreted proteins have a signal peptide at N-terminus;

• Not all proteins have a signal peptide are secreted:

• Signal peptide = secreted protein

Page 11: Eukaryotic Secretome Prediction and Knowledge-Base Development

11

SignalP: a program to predict if a protein contains a signal peptide.

Phobius: signal peptide and transmembrane domain predicton.

WolfPsort: a multiple subcellular location predictor

TargetP: detect proteins targeted to mitochondria.

TMHMM: transmembrane domain prediction.

PS-Scan: detection ER-retention signals

Page 12: Eukaryotic Secretome Prediction and Knowledge-Base Development

12

Page 13: Eukaryotic Secretome Prediction and Knowledge-Base Development

13

Page 14: Eukaryotic Secretome Prediction and Knowledge-Base Development

14

Human cytochrome C oxidase subunit 1 (COX1)

Page 15: Eukaryotic Secretome Prediction and Knowledge-Base Development

15

Page 16: Eukaryotic Secretome Prediction and Knowledge-Base Development

16

Data

Secreted Non-secreted

Fungi 241 5,992Animals 5,568 19,048Plants 216 7,528Protists 32 1,979

Page 17: Eukaryotic Secretome Prediction and Knowledge-Base Development

17

Method

• Sensitivity (%) = TP/(TP + FN) x 100

• Specificity (%) = TN/(TN + FP) x 100

• Mathews’ Correlation Coefficient (MCC) MCC (%) = (TP x TN – FP x FN) x 100 /((TP

+ FP) (TP + FN) (TN + FP) (TN + FN))1/2

Page 18: Eukaryotic Secretome Prediction and Knowledge-Base Development

18

TP FP TN FNSn

(%) Sp (%)MCC (%)

SignalP 232 329 5663 9 96.3 94.5 61.2

Phobius 226 203 5789 15 93.8 96.6 68.8

TargetP 228 583 5409 13 94.6 90.3 48.6

WolfPsort 230 167 5825 11 95.4 97.2 73.1

SignalP/TMHMM 228 168 5824 13 94.6 97.2 72.6

Phobius/TMHMM 224 200 5792 17 92.9 96.7 68.6

TargetP/TMHMM 224 265 5727 17 92.9 95.6 63.5

WolfPsort/TMHMM 227 135 5857 14 94.2 97.7 75.8

SignalP/TMHMM/WolfPsort 226 86 5906 15 93.8 98.6 81.6

SignalP/TMHMM//WolfPsort/Phobius 222 69 5923 19 92.1 98.8 83.1

SignalP/TMHMM/WolfPsort/Phobius/PS-Scan 222 67 5925 19 92.1 98.9 83.4

SignalP/TMHMM/WolfPsort/Phobius/TargetP/PS-Scan 218 66 5926 23 90.5 98.9 82.6

TP: true positives; FP: false positives; TN: true negatives; FN: false negatives. Sn: sensitivity; Sp:specificity;MCC: Mathews' correlation coefficient.

Table 1. Prediction accuracies of secreted proteins in fungi

Min XJ (2010) JPB 3:143-147.

Page 19: Eukaryotic Secretome Prediction and Knowledge-Base Development

19

Table 2. Prediction accuracies of secreted proteins in animals

TP FP TN FNSn (%) Sp (%)

MCC (%)

SignalP 5307 4108 14940 261 95.3 78.4 63.5

Phobius 5157 1167 17881 411 92.6 93.9 82.8

TargetP 5313 5412 13636 255 95.4 71.6 56.5

WolfPsort 5135 1762 17286 433 92.2 90.7 77.3

SignalP/TMHMM 5217 1383 17665 351 93.7 92.7 81.6

Phobius/TMHMM 5148 1142 17906 420 92.5 94.0 82.9

TargetP/TMHMM 5222 1369 17679 346 93.8 92.8 81.8

WolfPsort/TMHMM 5093 1084 17964 475 91.5 94.3 82.8

Phobius/WolfPsort 4959 555 18493 609 89.1 97.1 86.4

Phobius/WolfPsort/TMHMM 4952 544 18504 616 88.9 97.1 86.5

Phobius/WolfPsort/TMHMM/SignalP 4952 544 18504 616 88.9 97.1 86.5

Phobius/WolfPsort/TMHMM/TargetP 4934 505 18543 634 88.6 97.3 86.7

Phobius/WolfPsort/TMHMM/TargetP/PS-Scan 4931 482 18566 637 88.6 97.5 86.9

Phobius/WolfPsort/TMHMM/TargetP/PS-Scan/SignalP 4931 482 18566 637 88.6 97.5 86.9

TP: true positives; FP: false positives; TN: true negatives; FN: false negatives. Sn: sensitivity; Sp:specificity; MCC: Mathews' correlation coefficient.

Min XJ (2010) JPB 3:143-147.

Page 20: Eukaryotic Secretome Prediction and Knowledge-Base Development

20

Table 3. Prediction accuracies of secreted proteins in plants

TP: true positives; FP: false positives; TN: true negatives; FN: false negatives. Sn: sensitivity; Sp:specificity; MCC: Mathews' correlation coefficient.

TP FP TN FN Sn (%) Sp (%)MCC (%)

SignalP 199 364 7164 17 92.1 95.2 55.4

Phobius 188 638 6890 28 87.0 91.5 41.9

TargetP 198 442 7086 18 91.7 94.1 51.3

WolfPsort 108 70 7458 108 50.0 99.1 53.9

SignalP/TMHMM 197 237 7291 19 91.2 96.9 63.0

Phobius/TMHMM 188 636 6892 28 87.0 91.6 42.0

TargetP/TMHMM 195 256 7272 21 90.3 96.6 61.1

WolfPsort/TMHMM 106 45 7483 110 49.1 99.4 57.7

SignalP/HMM/TargetP 195 149 7379 21 90.3 98.0 70.6

Phobius/TargetP/TMHMM 183 122 7406 33 84.7 98.4 70.4

SignalP/TMHMM/WolfPsort 106 35 7493 110 49.1 99.5 59.9

SignalP/TMHMM/Phobius 188 183 7345 28 87.0 97.6 65.2

SignalP/HMM/Phobius/TargetP 183 113 7415 33 84.7 98.5 71.5

SignalP/HMM/Phobius/TargetP/PS-Scan 183 100 7428 33 84.7 98.7 73.2

SignalP/HMM/Phobius/TargetP/WolfPsort/PS-Scan 102 29 7499 114 47.2 99.6 59.8

Min XJ (2010) JPB 3:143-147.

Page 21: Eukaryotic Secretome Prediction and Knowledge-Base Development

21

Summary

• Different prediction tools have different accuracies for prediction of secretomes in different kingdoms of species;

• Combining these tools often increases the prediction accuracy. However, differential combination are needed for species in different kingdoms.

• Optimal methods are proposed.

Page 22: Eukaryotic Secretome Prediction and Knowledge-Base Development

22

Page 23: Eukaryotic Secretome Prediction and Knowledge-Base Development

23

Page 24: Eukaryotic Secretome Prediction and Knowledge-Base Development

24

Page 25: Eukaryotic Secretome Prediction and Knowledge-Base Development

25

Views

gi

accession

UniProt ID

Keywords

Species

User Inputs

Manual Curation

Subcellular Location

FunSecKB

fragAnchor

PS-SCAN

TMHMM

TargetP

WolfPsort

Phobius

SignalP

Database

RefSeq

UniProt

Prediction Tools

External Links

Lum G & Min XJ (2011) Database.

Page 26: Eukaryotic Secretome Prediction and Knowledge-Base Development

26

Summary of FunSecKB

• Currently the database contains a total of 478,073 fungal protein sequences

• 23,878 predicted and / or curated secreted proteins

• A total of 118 fungal species including 52 fungal species having a complete proteome

Page 27: Eukaryotic Secretome Prediction and Knowledge-Base Development

27Lum G & Min XJ (2011) Database.

Page 28: Eukaryotic Secretome Prediction and Knowledge-Base Development

28Lum G & Min XJ (2011) Database.

Page 29: Eukaryotic Secretome Prediction and Knowledge-Base Development

29Lum G & Min XJ (2011) Database.

Page 30: Eukaryotic Secretome Prediction and Knowledge-Base Development

30

Page 31: Eukaryotic Secretome Prediction and Knowledge-Base Development

31

Page 32: Eukaryotic Secretome Prediction and Knowledge-Base Development

32

Page 33: Eukaryotic Secretome Prediction and Knowledge-Base Development

33

Page 34: Eukaryotic Secretome Prediction and Knowledge-Base Development

34

Page 35: Eukaryotic Secretome Prediction and Knowledge-Base Development

35

Page 36: Eukaryotic Secretome Prediction and Knowledge-Base Development

36

Plant secretomes and other subcellular proteins

Vitis vinifera (%)

Populus trichocarpa (%)

Arabidopsis thaliana (%)

Oryza sativa (%)

SorghumBicolor (%)

Total proteins 29836 41794 32214 39997 32796

Secreted proteins 1892 (6.3) 2487 (6.0) 2835 (8.8) 3085 (7.7) 2394 (7.3)

Mitochondria

Membrane 490 (1.6) 566 (1.4) 415 (1.3) 832 (2.1) 666 (2.0)

Non-membrane 3877 (13.0) 5238 (12.5) 3729 (11.6) 7187 (18.0) 5768 (17.6)

Chloroplast

Membrane 565 (1.9) 601 (1.4) 671 (2.1) 720 (1.8) 610 (1.9)

Non-membrane 3675 (12.3) 4850 (11.6) 4865 (15.1) 6318 (15.8) 5385 (16.4)

ER proteins 29 (0.1) 37 (0.1) 60 (0.2) 32 (0.1) 25 (0.1)

Other membrane proteins 3251 (10.9) 4532 (10.8) 3649 (11.3) 3672 (9.2) 2900 (8.8)

Others (unknown) 16057 (53.8) 23483 (56.2) 15990 (49.64) 18151 (45.4) 15048 (45.9)

Page 37: Eukaryotic Secretome Prediction and Knowledge-Base Development

37

Page 38: Eukaryotic Secretome Prediction and Knowledge-Base Development

38

Page 39: Eukaryotic Secretome Prediction and Knowledge-Base Development

39

Acknowledgements

Gengkon Lum (M. S. Graduate)Jessica Orr (Undergraduate)Docylyne Shelton (Undergraduate)Braden Walters (Undergraduate)