base stacking classification via automated clustering method eli hershkovits 1, xavier le faucheur...
TRANSCRIPT
Base stacking classification via automated clustering
method Eli Hershkovits1, Xavier Le Faucheur1, Neocles Leontis2, Allen Tannenbaum1
1Georgia Institute of Technology, 2BGSU
Data Classification
• Coordinate system and parameterization
• Clustering of the data (“by eye” or Automated clustering)
Base stackingRing Coordinate system
• the three orthogonal directions are calculated with Cremer and Pople method.
• The coordinates y1 and y2 can be used to define face of the ring (up or down.)
X1
Y1Z1
X2
Y
2
Z2
r12
Base stackingRelative Coordinate system
• Relative rings coordinates are defined by the spherical coordinates r and
r
r r
Primary Classification
• For each base stacking candidate the two closest rings are chosen to represent the pair. This choice gives a classification to four groups: Pyrimidine-pyrimidine Pyrimidine-imidazole, Imadizole-pyrimidine and Imidazole-imidazole.
• There are four possible combinations of face-face interactions: Up-up, Up down, Down-up, Down,down.
Parameters relevant for clustering
0
20
40
60
80
100
120
140
160
1 22 43 64 85 106 127 148 169 190 211 232 253 274 295 316 337 358 379
0
20
40
60
80
100
120
1 22 43 64 85 106 127 148 169 190 211 232 253 274 295 316 337 358 379
r
Secondary classification
• The polar coordinates “r” , “” and “” are correlated and show distinction to two clusters” “Proper stacking” and improper stacking.
• Those classifications give 4*4*2 = 32 classes
Pyr - Pyr
Relative orientation
proper improper
UU 143C:G142 155C:C154
DD 511A:A509 743G:C699
UD 144A:G135 172U:G164
DU 147G:U146 897A:G765
Im - Pyr
Relative orientation
proper improper
UU 132A:A131 231G:C230
DD 2813A:A2811 2792A:U2791
UD 226A:A215 273G:C271
DU 174A:C173
Pyr-Im
Relative orientation
proper improper
UU 129A:A128 1360C:A1358
DD 129A:A116 2058G:G636
UD 176U:A174 922A:G921
DU 893G:G892 866U:A776
Im-Im
Relative orientation
proper improper
UU 159G:G158 223G:G222
DD 2564G:A2513 1190G:A1189
UD 1626A:A1624
DU 1664A:G1663
Possible problems
• For stacking of residues that are not neighbors the distribution of is broad.
• Possible overlap between clusters.