identifying extracellular plant proteins based on frequent subsequences of amino acids
DESCRIPTION
Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids. Y. Wang, O. Zaiane, R. Goebel. Introduction. Protein: linear sequence of amino acids Protein subcellular localization Plant: nuclear, cytoplamic, mitochondria, extracellular, … - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.vdocuments.net/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/1.jpg)
Identifying Extracellular Plant Proteins Based on
Frequent Subsequences of Amino Acids
Y. Wang, O. Zaiane, R. Goebel
![Page 2: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.vdocuments.net/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/2.jpg)
2
IntroductionProtein: linear sequence of amino acidsProtein subcellular localization Plant: nuclear, cytoplamic,
mitochondria, extracellular, …Intracellular vs. Extracellular Sequence information alone Class imbalance Transparency
![Page 3: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.vdocuments.net/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/3.jpg)
3
Related WordN-terminal sorting signalsAmino acid compositionLexical analysisIntegrative approachSubsequence methods
![Page 4: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.vdocuments.net/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/4.jpg)
4
Predicting Extracellular Proteins
Feature ExtractionSupport Vector MachineBoostingFrequent Pattern Method
![Page 5: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.vdocuments.net/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/5.jpg)
5
Feature ExtractionFrequent subsequences: subsequences that occur in more than a certain percentage of extracellular proteins Strong discriminative power Perform similar functions via
relationed biochemical mechanism Capture local similarity
![Page 6: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.vdocuments.net/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/6.jpg)
6
Generalized Suffix Tree
![Page 7: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.vdocuments.net/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/7.jpg)
7
Support Vector MachineInput data represented as feature vectorsFind a linear separator that separate the data and maximize the marginKernel function: nonlinear separator
![Page 8: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.vdocuments.net/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/8.jpg)
8
SVM for extracellular protein prediction
Data Transformation(sequencevector) Frequent subsequences as features Transform protein sequence as binary
vectorsKernel Functions Linear kernel Polynomial kernel RBF kernel
![Page 9: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.vdocuments.net/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/9.jpg)
9
BoostingIterative algorithms to improve weak classifierDifferent weighted distribution of examples in each iterationIncrease the weights of incorrectly classified examples, and decrease the weights of correctly classified ones
![Page 10: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.vdocuments.net/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/10.jpg)
10
AdaBoost
![Page 11: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.vdocuments.net/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/11.jpg)
11
Frequent Pattern MethodFrequent pattern: *X1*X2*…*Xn* extracellular X1,X2,…Xn are frequent
subsequences “*” can be substituted to zero or up to
MaxGap amino acids when matching a protein sequence
![Page 12: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.vdocuments.net/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/12.jpg)
12
FOIL algorithm
![Page 13: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.vdocuments.net/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/13.jpg)
13
Z-number
:accuracy of rule R:support of rule R
![Page 14: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.vdocuments.net/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/14.jpg)
14
![Page 15: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.vdocuments.net/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/15.jpg)
15
ExperimentsDataset(PASub project at UofA) Plant: 3293 proteins, 171 extracellularFive-cross validation
![Page 16: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.vdocuments.net/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/16.jpg)
16
Evaluation MatrixOverall accuracy is not good enoughF-measure
![Page 17: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.vdocuments.net/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/17.jpg)
17
Result(SVM with subsequence)
![Page 18: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.vdocuments.net/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/18.jpg)
18
Result(Boosting with subsequence)
![Page 19: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.vdocuments.net/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/19.jpg)
19
Result(Frequent Pattern)
MinLen=3Min_gain=0.1
03.08.0
MinSup=5%MinConf=80%MaxGap=300
![Page 20: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.vdocuments.net/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/20.jpg)
20
Result(SVM with composition)
![Page 21: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.vdocuments.net/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/21.jpg)
21
Result(Boosting with composition)
![Page 22: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.vdocuments.net/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/22.jpg)
22
Cross Comparision
![Page 23: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.vdocuments.net/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/23.jpg)
23
SVM with combined features
![Page 24: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.vdocuments.net/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/24.jpg)
24
Boosting with combined features
![Page 25: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.vdocuments.net/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/25.jpg)
25
Effects of MinLen on SVM
![Page 26: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.vdocuments.net/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/26.jpg)
26
Effects of MinLen on boosting
![Page 27: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.vdocuments.net/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/27.jpg)
27
ConclusionPresented three methods for identifying extracellular proteins based on frequent subsequence of amino acidsSVM achieves the best resultFSP method provides easily interpretable rules
![Page 28: Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids](https://reader036.vdocuments.net/reader036/viewer/2022070502/56813886550346895da03974/html5/thumbnails/28.jpg)
28
Future WorkUse for information about proteins (e.g., structure, function, …)Integrating amino acid composition into FSP methodIncorporate more biological knowledge