protein folding recognition with committee machine mika takata

Download Protein Folding recognition with Committee Machine Mika Takata

If you can't read please download the document

Upload: paula-norris

Post on 18-Jan-2018

224 views

Category:

Documents


0 download

DESCRIPTION

Background  Computation + biology + chemical + medicine + ・・・・ = significantly important  Structure Classification Of Protein database  Fold level class : remote homology  Better recognition, better Tertiary structure prediction All alpha SCOP All beta a/ba+b Globin- like Cytoch- rome c Cupre- doxins (TIM)- barrel β- grasp class Fold ・・・・・ ・・・・

TRANSCRIPT

Protein Folding recognition with Committee Machine Mika Takata Outline Background System Outline Experiment Experimental result Reference 2 Background Computation + biology + chemical + medicine + = significantly important Structure Classification Of Protein database Fold level class : remote homology Better recognition, better Tertiary structure prediction All alpha SCOP All beta a/ba+b Globin- like Cytoch- rome c Cupre- doxins (TIM)- barrel - grasp class Fold 1. Chemical approaching parameter ( i ) i. 6 types of Chemical features ii. String windows N-grams iii. Protein molecular weight value iv. Protein sequential length value 4 1. Chemical approaching parameter ( ii ): Global parameter Symbol C Frequencies of 20 amino acid symbols in a protein sequence Symbol S, H, V, P, Z (3-dim: composition, 3-dim: transition, 35-dim: Distribution) 1. Chemical approaching parameter ( iii ) Protein molecular weight value Sum of Amino acids molecular weight Utilize of molecular weight Protein sequential length value Utilize of sequential length 2. Feature parameter based on Sliding window N-Gram Proteomic fragment similarity string length =2 NSDWTNNETRHAIVILIIIIIMLRHGKIPYWCMIPFAA 3: Feature parameter based on HMM Fig 1 feature parameter flow based on HMM Training data Test data Model Model C S V H P Seq-Length Z Mol-Weight Model Spectrum Kernel HMM decision_ Committe e SVM_1 Committe e SVM_ Committe e SVM_27 Step 2 Step 1 Evaluation measurement Accuracy Q shows how correctly recognized in class i The numbers of data in each class are various Experiment Parameter i. Chemical approaching parameter ii. Feature parameter based on Sliding window kernel (string length = 2 & 3) iii. Feature parameter based on HMM i. Classification Methods i. independent SVM ii. Committee SVM Array Multi-class recognition approaches i. One-vs-others ii. All-vs-All method Data set Training data 341, test data 353 (total: 694) Cross Validation 10 times Result (1) Independent SVM- Model I Result (2) CM- Model I Result (3) CM- Model II Result (3) Model I & II Result (4) Model I & III Result (5) : Model I & II & III Conlusion Improvement by using all models of Committee Machine Spectrum kernel was works if used with string length of 2 advantage Take advantage of sporadic data ( ex. chemical base and hmm) Reduce of computational cost Reference ( i ) 1. Takata, M., Matsuyama, Y.: Protein Folding Classification by Committee SVM Array, Lecture Notes in Computer Science, No.5507, pp , Matsuyama, Y., Kawasaki, K., Hotta, T, mizutani, Takata, M., Ishida, A.: Eukaryotic transcription start site recognition involving non-promoter model. Intelligent Systems for Molecular Biology, Toronto (2008) L05 3. Matsuyama, Y., Ishihara, Y., Ito, Y., Hotta, T., Kawasaki, K., Hasegawa, T., Takata, M.: Promoter recognition involving motif detection: Studies on E. coli and human genes. Intelligent Systems for Molecular Biology, Vienna (2007) H Dubchak, I., Muchunik, I., Holbrook, S.R., Kim, S-H.: Prediction of protein folding class using global description of amino acid sequence. Proc. Natl. Acad. Sci. USA 92 (1995) 8700 Dubchak, I., Muchnik, I., Mayor, C., Dralyyuk, I., Kim, S-H.: Recognition of a Protein Fold in the Context of the SCOP Classification. Proteins: Structure, Function, and Genetics 35 (1999) 401407 Reference ( ii ) 1. Ding, C.H.Q, Dubchak, I.: Multi-class protein fold recognition using support vector machines and neural networks. Bioinfo. 17 (2001) 349 Mount,. D.W.: Bioinformatics. Cold Spring Harbor Laboratory Press (2001) 3. Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247 (1995) 536 Leslie, C., Eskin, E., Noble, W.S.: The Spectrum kernel: A string kernel for SVM protein classification. Pacific Symposium on Biocomputing 7 (2002) 566 Tabrez, M., Shamim, A., Anwaruddin, M., Nagarajaram, H.A.: Support vector machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs. Bioinfo. 23 (2007) 3320 Lodhi, H,., Saunders, C., Shawe-Taylor, J., Watkins, C.: Text classification using string kernels. J. of Machine Learning Research 2 (2002) 419444.