vector quantization and reduced modelssitus.biomachina.org/sd03/willy2.pdf · vector quantization...
TRANSCRIPT
Situs Modeling Workshop, San Diego, CA, Feb. 3-5, 2003
Vector Quantizationand Reduced Models
Willy Wriggers, Ph.D.Department of Molecular BiologyThe Scripps Research Institute
10550 N. Torrey Pines Road, Mail TPC6
La Jolla, California, 92037
1
Reduced Representations of BiomolecularStructure
Reduced Representations of BiomolecularStructure
emiwcalc
jw
Feature points (fiducials, landmarks), reduce complexity of search space
Useful for:
•Rigid-body fitting (today)•Flexible fitting (today)•Interactive fitting / force feedback (S. Birmanns, Tu 9AM)•Building of deformable models (F. Tama, P. Chacon, Tu 10AM)
1w
2w
3w
4w
Vector QuantizationLloyd (1957)
Linde, Buzo, & Gray (1980)Martinetz & Schulten (1993)
Digital Signal Processing,Speech and Image Compression.Topology-Representing Network.
}
{ }jwEncode data (in ) using a finite set (j=1,…,k) of codebook vectors.3=ℜ d
Delaunay triangulation divides into k Voronoi polyhedra (“receptive fields”):3ℜ
{ }jwvwvv jii ∀−≤−ℜ∈= 3V
2
1w
2w
3w
4w
Linde, Buzo, Gray (LBG) AlgorithmEncoding Distortion Error:
ii
mijiE wv∑ −=voxels)(atoms,
2
)(
Lower iteratively: Gradient descent( ){ }( )twE j
( ) ( ) ( ) .2
1)( )(∑ −⋅=∂∂⋅−=−−≡∆
iiriirj
rrrr mwv
w
Etwtwtw δεε
:r∀
Inline (Monte Carlo) approach for a sequence selected at randomaccording to weights
( )tvi:im
( )( ).~)( )( riirjr wtvtw −⋅⋅=∆ δεHow do we avoid getting trapped in the many local minima of E?
Soft-Max Adaptation
110
)1(10
−===
−≤≤−≤− −
ksss
wvwvwv
rrr
kjijiji …
{ }( ) .)(,~ 2k
1ri
i
s
j mijiewE wvr
∑ −∑−
=
=λ
λ
( ) { }( )jir wtvs ,
Avoid local minima by smoothing of energy function (here: TRN method):
( )( ) ,~)(: ri
s
r wtvetwrr
−⋅⋅=∆∀−λε
Where is the closeness rank:
Note: LBG algorithm.not only “winner” , also second, third, ... closest are updated.
:0→λ:0≠λ ( )ijw
Can show that this corresponds to stochastic gradient descent on
Note: LBG algorithm.parabolic (single minimum).
.~
:0 EE →→λE~
:∞→λ } ( )tλ⇒
3
QuestionAnswer
Codebook vector variability arises due to:• statistical uncertainty,• spread of local minima.
A small variability indicates good convergence behavior.Optimum choice of # of vectors k: variability is minimal.
Q: How do we know that we have found the global minimum of E?
A: We don’t (in general).
But we can compute the statistical variability of the by repeating thecalculation with different seeds for random number generator.
{ }jw
EMlow res. data
Xtalstructure
)(hjw )(l
jw
•Estimate optimum k with variability criterion.•Index map I: (m, n = 1,…,k).• k! = k (k-1)…2 possible combinations.• For each index map I perform a least squares fit of the to the .• Quality of I: residual rms deviation
• Find optimal I by direct enumeration of the k! cases (minimum of ).
nm →
)(ljw)(
)(h
jIw
I∆
∑ −=
=∆k
jI
lj
hjIk
ww1
2)()(
)(1
Single-Molecule Rigid-Body Docking
5
Reduced Search FeaturesTop 20, 7!=5040 possible pairsof codebook vectors.
I (permutation)hlCI∆
For a fixed k, codebookrmsd is more stringentcriterion than correlationcoefficient!
Performance (I)Dependence on experimental EM density threshold (ncd, k=7):
orientations are stable:+/- 5o variability for +/-50% threshold density variation.Threshold level can be optimized via radius of gyration of vectors.
Dependence on resolution (simulated EM map, automatic assignment ofk from ):3 9k≤ ≤
Deviation from start structure (PDB: 1TOP)used to generate simulated EM map.
Accurate matching up to ~30ÅCatastrophic misalignment
6
Performance (II)Is minimum vector variability a suitable choice for optimum k?
10 test systems,simulated EM densitiesfrom 2-100Å.
2-20Å (reliable fitting)22-50Å (borderline)52-100Å (mismatches)
Reasonable correlationwith actual deviation
No “false positives” forresolution values < 20Åand variability < 1Å.
3 9k≤ ≤
Wriggers & Birmanns, J. Struct. Biol 133, 193-202 (2001)
Performance (III)
Multiple Subunits
Egelman lab: High-resolutionreconstructions of F-actin - plant ADFbased on single-particle imageprocessing.
Unrestrained vectors fail to distinguishbetween actin and ADF densities (poorsegmentation)
Remedies:
•Skeletons (today)•Correlation-Based Search (P Chacón,today; J. Kovacs, tomorrow)
7
Conclusions (Rigid-Body Fitting)
“Classic” Situs fitting approach, versions 1.0-1.4.
Advantages of vector quantization:•Fast (seconds of compute time).•Reduced search is robust.
Limitations:•No estimation of “fitting contrast” near optimum•Works best for single molecules, not for matching subunits to largerdensities.
Flexible Fitting with Molecular Dynamics
)(hjw )(l
jw
EM / SAXSlow res. data
Xtalstructure
constraincentroids
moleculardynamicssimulation(X-PLOR)
9
Flexible Docking of Elongation Factor G
© Joachim Frank, 1998
binding of EF-Gand EF-Tuto the ribosome
Flexible Docking of Elongation Factor G
rigid-body docking flexible docking(5 vectors)
flexible docking(10 vectors,variable number perdomain)
see Wriggers et al., Biophys. J. (2000) 79:1670-1678.
Note possible overfitting of domain IV!
10
Stereochemical Quality of Flexible Fitting
1.) Assumption: structure remains locally similar to the initial crystal structure.
In this case precision: ~10 times above the nominal resolution of the EMmap, but it is not known in advance if the assumption holds.
2.) The atomic model has many more degrees of freedom than there areindependent pieces of information in the EM map. Hence, there is the dangerthat overfitting distorts the structure
How can overfitting be avoided? Reduce noise by eliminating “inessential”degrees of freedom!…
Skeletons Limit the Effect of Noise:
+ =
=+
unrestrained vectors exp. and meth. uncertainty distortion
skeleton distance constraints less distortion
freezing inessential degrees of freedom:
17
(i) Piecewise-Linear Inter- / ExtrapolationFor each probe position find 4 closest vectors.
Ansatz:
Cramer’s rule:
1w
3w
2w
4w( , , )x y zF F F=F
1 1,
2 2,
3 3,
4 4,
( , , )
( ) ,
( ) ,
( ) ,
( ) (similar for , ).
x
x x
x x
x x
x x y z
F x y z ax by xz d
F f
F f
F f
F f F F
= + + +====
w
w
w
w
1f 2f
4f
3f
1, 1, 1, 1, 1, 1,
2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3,
4, 4, 4, 4, 4, 4,
1 1
1 1
1 1
1 1, , ,
x y z x y z
x y z x y z
x y z x y z
x y z x y z
f w w w f w
f w w w f w
f w w w f w
f w w w f wa b
D D= = #
1, 1, 1,
2, 2, 2,
3, 3, 3,
4, 4, 4,
1
1
1
1
x y z
x y z
x y z
x y z
w w w
w w wD
w w w
w w w
=
(ii) Non-Linear Kernel InterpolationConsider all k vectors and interpolation kernel function U(r).
Ansatz:
Solve :
( )11
,
( , , ) ( , , )
( ) , (similar for , ).
k
x x y z i ik
x i i x y z
F x y z a a x a y a z b U x y z
F f i F F=
= + + + + ⋅ −
= ∀
∑ w
w
11, , 1 1
1, 1, 1,
, , ,
12 1
21 2
1 2
L ( , , , 0,0,0,0) ( , , , , , , ) ,
1
where , , 4,
1
0 ( ) ( )
( ) 0 ( ), .
( ) ( ) 0
x k x k x y z
x y z
k x k y k z
k
k
k k
f f b b a a a a
w w w
k
w w w
U w U w
U w U wk k
U w U w
− =
= = ×
= ×
T
T
P QL Q
Q 0
P
# #
# # # #
##
# # # ##
18
• kernel function U(r) is principal solution of biharmonic equation that arises inelasticity theory of thin plates:
• variational principle: U(r) minimizes the bending energy (not shown).• 1D: U(r) = |r3| (cubic spline)• 2D: U(r) = r2 log r2
• 3D: U(r) = |r|
2D: U(r)
• we are interested mainly in 3D case but will also consider 2D (differentiable).
Bookstein “Thin-Plate” Splines
2 4( ) ( ) δ( ).U r U r r∆ = ∇ =
( , )x yF
Interpol RNAP 1
Taq RNAP x-tal structure
19
Interpol RNAP 2
Flexibly fitted (MD) structure
Interpol RNAP 3
Piecewise-linear inter- / extrapolation
20
Interpol RNAP 4
Thin-plate splines, 2D (r2 log r2) kernel
Interpol RNAP 5
Thin-plate splines, 3D |r| kernel
21
Summary
Reduced (vector quantization) representations are useful for a varietyof applications:
• Rigid-body docking.• (Fast computation of forces and torques for haptic devices - S. Birmanns).• Flexible fitting with molecular dynamics.• Estimation of displacement vector fields.
(Non-linear) Interpolation is a viable alternative to MD in flexible fittingif stereochemical quality is optimized after morphing.
Interpolation allows displacements of vectors to be interpolated to fullspace, useful in Normal Modes Analysis (F. Tama, P. Chacón).
Availability: Situs 2.2
Acknowledgements
Pablo Chacón, Julio Kovacs, Stefan Birmanns.
Klaus Schulten, UIUCRon Milligan, TSRIEdward H. Egelman, U VirginiaJoachim Frank, Wadsworth CenterSeth Darst, Rockefeller U
La Jolla Interfaces in Science ProgramGrants from NIH to W.W. and Charles Brooks, III (MMTSB)