prediction of protein binding sites in protein structures using hidden markov support vector machine
TRANSCRIPT
Prediction of Protein Binding Sites in Protein Structures Using Hidden Markov Support Vector Machine
Slate: the target protein.Blue: the binding partner.Magenta: interface residues.
SSSEIKIVRDEYGMPHIYANDTWHLFYGYG
IIINIINNIINNNIIIIIIINIINIIINNN
Input
Output
Machine Learning Methods Applied
Classification methods
Sequential labelling methods
ANN
SVM
CRF
FEATURES
• Neighboring residue profile feature• Hydrophobicity• Sequence conservation• Secondary structure• Solvent accessible surface area
Hidden Markov Support Vector Machine
Discriminant function
Emission feature function
Transition feature function
Corresponding weight
Hidden Markov Support Vector Machine
Spatially neighboring residue profile feature
𝑒𝑦 ,𝑎𝑎𝑝𝑟𝑜𝑓𝑖𝑙𝑒 (𝑥𝑘 , 𝑦 𝑖 )={L (𝑃𝑆𝑆𝑀 (𝑥𝑘 ,𝑎𝑎 )) ,∧if 𝑦 𝑖=𝑦
0 ,∧otherwise
Spatially neighboring residue accessible surface (ASA) feature
𝑒𝑦𝐴𝑆𝐴 (𝑥𝑘 , 𝑦 𝑖 )={ASA (𝑥𝑘) ,∧if 𝑦 𝑖=𝑦
0 ,∧otherwise
Emission feature function
𝐿 (𝑥 ){ 0 𝑖𝑓 𝑥≤−512+ 𝑥10
𝑖𝑓 −5<𝑥<5
1 h𝑜𝑡 𝑒𝑟𝑤𝑖𝑠𝑒
Hidden Markov Support Vector Machine
Discriminant function
Transition feature function
Hidden Markov Support Vector Machine
Transition feature function
𝑡𝑦 ,𝑦 ′ (𝑥 , 𝑦 𝑖− 1 , 𝑦 𝑖 )={1 ,∧if 𝑦 𝑖−1= y∧𝑦 𝑖=𝑦0 ,∧otherwise
Hidden Markov Support Vector Machine
Discriminant functionCorresponding weight
Hidden Markov Support Vector Machine
Optimization problem
s.t.
Source Code: http://www.cs.cornell.edu/People/tj/svm_light/svm_hmm.html
☆ The cutting-plane algorithm makes it linear
DATA SET
𝐅𝟏=𝟐×𝑺𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚+¿×
𝑺𝒆𝒏𝒔𝒊𝒕𝒊𝒗𝒊𝒕𝒚+¿
𝑺𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚 +¿+𝑺𝒆𝒏𝒔𝒊𝒕𝒊𝒗𝒊𝒕𝒚 +¿ ¿¿¿¿
𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲=𝑻𝑷 +𝑻𝑵
𝐓𝐏+𝑻𝑵+𝑭𝑷+𝑭𝑵
𝐌𝐂𝐂=𝑻𝑷×𝑻𝑵 −𝑭𝑷×𝑭𝑵
√ (𝐓𝐏+𝐅𝐍 ) (𝑻𝑷+𝑭𝑷 ) (𝑻𝑵+𝑭𝑷 ) (𝑻𝑵+𝑭𝑵 )
Influence of the number of training samples on the prediction performance and running time
Influence of the number of training samples on the prediction performance and running time
The inter-relation information between neighboring residues is relevant for discrimination
The window size has not significant influence on the performance
Actual interface residues ANN
SVM CRF HM-SVM
Comparison with related methods
Actual interface residues ANN
SVM CRF HM-SVM
Comparison with related methods
SUMMARY
• Prediction of protein binding sites• Hidden Markov Support Vector Machine• Result Analysis
• Comparison with other methods• Influence of the number of training samples• The information between neighboring residues• Window size
• Discussion