vector quantization and reduced modelssitus.biomachina.org/sd03/willy2.pdf · vector quantization...

22
Situs Modeling Workshop, San Diego, CA, Feb. 3-5, 2003 Vector Quantization and Reduced Models Willy Wriggers, Ph.D. Department of Molecular Biology The Scripps Research Institute 10550 N. Torrey Pines Road, Mail TPC6 La Jolla, California, 92037

Upload: others

Post on 22-Mar-2020

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Vector Quantization and Reduced Modelssitus.biomachina.org/sd03/willy2.pdf · Vector Quantization and Reduced Models Willy Wriggers, Ph.D. ... Threshold level can be optimized via

Situs Modeling Workshop, San Diego, CA, Feb. 3-5, 2003

Vector Quantizationand Reduced Models

Willy Wriggers, Ph.D.Department of Molecular BiologyThe Scripps Research Institute

10550 N. Torrey Pines Road, Mail TPC6

La Jolla, California, 92037

Page 2: Vector Quantization and Reduced Modelssitus.biomachina.org/sd03/willy2.pdf · Vector Quantization and Reduced Models Willy Wriggers, Ph.D. ... Threshold level can be optimized via

1

Reduced Representations of BiomolecularStructure

Reduced Representations of BiomolecularStructure

emiwcalc

jw

Feature points (fiducials, landmarks), reduce complexity of search space

Useful for:

•Rigid-body fitting (today)•Flexible fitting (today)•Interactive fitting / force feedback (S. Birmanns, Tu 9AM)•Building of deformable models (F. Tama, P. Chacon, Tu 10AM)

1w

2w

3w

4w

Vector QuantizationLloyd (1957)

Linde, Buzo, & Gray (1980)Martinetz & Schulten (1993)

Digital Signal Processing,Speech and Image Compression.Topology-Representing Network.

}

{ }jwEncode data (in ) using a finite set (j=1,…,k) of codebook vectors.3=ℜ d

Delaunay triangulation divides into k Voronoi polyhedra (“receptive fields”):3ℜ

{ }jwvwvv jii ∀−≤−ℜ∈= 3V

Page 3: Vector Quantization and Reduced Modelssitus.biomachina.org/sd03/willy2.pdf · Vector Quantization and Reduced Models Willy Wriggers, Ph.D. ... Threshold level can be optimized via

2

1w

2w

3w

4w

Linde, Buzo, Gray (LBG) AlgorithmEncoding Distortion Error:

ii

mijiE wv∑ −=voxels)(atoms,

2

)(

Lower iteratively: Gradient descent( ){ }( )twE j

( ) ( ) ( ) .2

1)( )(∑ −⋅=∂∂⋅−=−−≡∆

iiriirj

rrrr mwv

w

Etwtwtw δεε

:r∀

Inline (Monte Carlo) approach for a sequence selected at randomaccording to weights

( )tvi:im

( )( ).~)( )( riirjr wtvtw −⋅⋅=∆ δεHow do we avoid getting trapped in the many local minima of E?

Soft-Max Adaptation

110

)1(10

−===

−≤≤−≤− −

ksss

wvwvwv

rrr

kjijiji …

{ }( ) .)(,~ 2k

1ri

i

s

j mijiewE wvr

∑ −∑−

=

λ

( ) { }( )jir wtvs ,

Avoid local minima by smoothing of energy function (here: TRN method):

( )( ) ,~)(: ri

s

r wtvetwrr

−⋅⋅=∆∀−λε

Where is the closeness rank:

Note: LBG algorithm.not only “winner” , also second, third, ... closest are updated.

:0→λ:0≠λ ( )ijw

Can show that this corresponds to stochastic gradient descent on

Note: LBG algorithm.parabolic (single minimum).

.~

:0 EE →→λE~

:∞→λ } ( )tλ⇒

Page 4: Vector Quantization and Reduced Modelssitus.biomachina.org/sd03/willy2.pdf · Vector Quantization and Reduced Models Willy Wriggers, Ph.D. ... Threshold level can be optimized via

3

QuestionAnswer

Codebook vector variability arises due to:• statistical uncertainty,• spread of local minima.

A small variability indicates good convergence behavior.Optimum choice of # of vectors k: variability is minimal.

Q: How do we know that we have found the global minimum of E?

A: We don’t (in general).

But we can compute the statistical variability of the by repeating thecalculation with different seeds for random number generator.

{ }jw

EMlow res. data

Xtalstructure

)(hjw )(l

jw

•Estimate optimum k with variability criterion.•Index map I: (m, n = 1,…,k).• k! = k (k-1)…2 possible combinations.• For each index map I perform a least squares fit of the to the .• Quality of I: residual rms deviation

• Find optimal I by direct enumeration of the k! cases (minimum of ).

nm →

)(ljw)(

)(h

jIw

I∆

∑ −=

=∆k

jI

lj

hjIk

ww1

2)()(

)(1

Single-Molecule Rigid-Body Docking

Page 5: Vector Quantization and Reduced Modelssitus.biomachina.org/sd03/willy2.pdf · Vector Quantization and Reduced Models Willy Wriggers, Ph.D. ... Threshold level can be optimized via
Page 6: Vector Quantization and Reduced Modelssitus.biomachina.org/sd03/willy2.pdf · Vector Quantization and Reduced Models Willy Wriggers, Ph.D. ... Threshold level can be optimized via

5

Reduced Search FeaturesTop 20, 7!=5040 possible pairsof codebook vectors.

I (permutation)hlCI∆

For a fixed k, codebookrmsd is more stringentcriterion than correlationcoefficient!

Performance (I)Dependence on experimental EM density threshold (ncd, k=7):

orientations are stable:+/- 5o variability for +/-50% threshold density variation.Threshold level can be optimized via radius of gyration of vectors.

Dependence on resolution (simulated EM map, automatic assignment ofk from ):3 9k≤ ≤

Deviation from start structure (PDB: 1TOP)used to generate simulated EM map.

Accurate matching up to ~30ÅCatastrophic misalignment

Page 7: Vector Quantization and Reduced Modelssitus.biomachina.org/sd03/willy2.pdf · Vector Quantization and Reduced Models Willy Wriggers, Ph.D. ... Threshold level can be optimized via

6

Performance (II)Is minimum vector variability a suitable choice for optimum k?

10 test systems,simulated EM densitiesfrom 2-100Å.

2-20Å (reliable fitting)22-50Å (borderline)52-100Å (mismatches)

Reasonable correlationwith actual deviation

No “false positives” forresolution values < 20Åand variability < 1Å.

3 9k≤ ≤

Wriggers & Birmanns, J. Struct. Biol 133, 193-202 (2001)

Performance (III)

Multiple Subunits

Egelman lab: High-resolutionreconstructions of F-actin - plant ADFbased on single-particle imageprocessing.

Unrestrained vectors fail to distinguishbetween actin and ADF densities (poorsegmentation)

Remedies:

•Skeletons (today)•Correlation-Based Search (P Chacón,today; J. Kovacs, tomorrow)

Page 8: Vector Quantization and Reduced Modelssitus.biomachina.org/sd03/willy2.pdf · Vector Quantization and Reduced Models Willy Wriggers, Ph.D. ... Threshold level can be optimized via

7

Conclusions (Rigid-Body Fitting)

“Classic” Situs fitting approach, versions 1.0-1.4.

Advantages of vector quantization:•Fast (seconds of compute time).•Reduced search is robust.

Limitations:•No estimation of “fitting contrast” near optimum•Works best for single molecules, not for matching subunits to largerdensities.

Flexible Fitting with Molecular Dynamics

)(hjw )(l

jw

EM / SAXSlow res. data

Xtalstructure

constraincentroids

moleculardynamicssimulation(X-PLOR)

Page 9: Vector Quantization and Reduced Modelssitus.biomachina.org/sd03/willy2.pdf · Vector Quantization and Reduced Models Willy Wriggers, Ph.D. ... Threshold level can be optimized via
Page 10: Vector Quantization and Reduced Modelssitus.biomachina.org/sd03/willy2.pdf · Vector Quantization and Reduced Models Willy Wriggers, Ph.D. ... Threshold level can be optimized via

9

Flexible Docking of Elongation Factor G

© Joachim Frank, 1998

binding of EF-Gand EF-Tuto the ribosome

Flexible Docking of Elongation Factor G

rigid-body docking flexible docking(5 vectors)

flexible docking(10 vectors,variable number perdomain)

see Wriggers et al., Biophys. J. (2000) 79:1670-1678.

Note possible overfitting of domain IV!

Page 11: Vector Quantization and Reduced Modelssitus.biomachina.org/sd03/willy2.pdf · Vector Quantization and Reduced Models Willy Wriggers, Ph.D. ... Threshold level can be optimized via

10

Stereochemical Quality of Flexible Fitting

1.) Assumption: structure remains locally similar to the initial crystal structure.

In this case precision: ~10 times above the nominal resolution of the EMmap, but it is not known in advance if the assumption holds.

2.) The atomic model has many more degrees of freedom than there areindependent pieces of information in the EM map. Hence, there is the dangerthat overfitting distorts the structure

How can overfitting be avoided? Reduce noise by eliminating “inessential”degrees of freedom!…

Skeletons Limit the Effect of Noise:

+ =

=+

unrestrained vectors exp. and meth. uncertainty distortion

skeleton distance constraints less distortion

freezing inessential degrees of freedom:

Page 12: Vector Quantization and Reduced Modelssitus.biomachina.org/sd03/willy2.pdf · Vector Quantization and Reduced Models Willy Wriggers, Ph.D. ... Threshold level can be optimized via
Page 13: Vector Quantization and Reduced Modelssitus.biomachina.org/sd03/willy2.pdf · Vector Quantization and Reduced Models Willy Wriggers, Ph.D. ... Threshold level can be optimized via
Page 14: Vector Quantization and Reduced Modelssitus.biomachina.org/sd03/willy2.pdf · Vector Quantization and Reduced Models Willy Wriggers, Ph.D. ... Threshold level can be optimized via
Page 15: Vector Quantization and Reduced Modelssitus.biomachina.org/sd03/willy2.pdf · Vector Quantization and Reduced Models Willy Wriggers, Ph.D. ... Threshold level can be optimized via
Page 16: Vector Quantization and Reduced Modelssitus.biomachina.org/sd03/willy2.pdf · Vector Quantization and Reduced Models Willy Wriggers, Ph.D. ... Threshold level can be optimized via
Page 17: Vector Quantization and Reduced Modelssitus.biomachina.org/sd03/willy2.pdf · Vector Quantization and Reduced Models Willy Wriggers, Ph.D. ... Threshold level can be optimized via
Page 18: Vector Quantization and Reduced Modelssitus.biomachina.org/sd03/willy2.pdf · Vector Quantization and Reduced Models Willy Wriggers, Ph.D. ... Threshold level can be optimized via

17

(i) Piecewise-Linear Inter- / ExtrapolationFor each probe position find 4 closest vectors.

Ansatz:

Cramer’s rule:

1w

3w

2w

4w( , , )x y zF F F=F

1 1,

2 2,

3 3,

4 4,

( , , )

( ) ,

( ) ,

( ) ,

( ) (similar for , ).

x

x x

x x

x x

x x y z

F x y z ax by xz d

F f

F f

F f

F f F F

= + + +====

w

w

w

w

1f 2f

4f

3f

1, 1, 1, 1, 1, 1,

2, 2, 2, 2, 2, 2,

3, 3, 3, 3, 3, 3,

4, 4, 4, 4, 4, 4,

1 1

1 1

1 1

1 1, , ,

x y z x y z

x y z x y z

x y z x y z

x y z x y z

f w w w f w

f w w w f w

f w w w f w

f w w w f wa b

D D= = #

1, 1, 1,

2, 2, 2,

3, 3, 3,

4, 4, 4,

1

1

1

1

x y z

x y z

x y z

x y z

w w w

w w wD

w w w

w w w

=

(ii) Non-Linear Kernel InterpolationConsider all k vectors and interpolation kernel function U(r).

Ansatz:

Solve :

( )11

,

( , , ) ( , , )

( ) , (similar for , ).

k

x x y z i ik

x i i x y z

F x y z a a x a y a z b U x y z

F f i F F=

= + + + + ⋅ −

= ∀

∑ w

w

11, , 1 1

1, 1, 1,

, , ,

12 1

21 2

1 2

L ( , , , 0,0,0,0) ( , , , , , , ) ,

1

where , , 4,

1

0 ( ) ( )

( ) 0 ( ), .

( ) ( ) 0

x k x k x y z

x y z

k x k y k z

k

k

k k

f f b b a a a a

w w w

k

w w w

U w U w

U w U wk k

U w U w

− =

= = ×

= ×

T

T

P QL Q

Q 0

P

# #

# # # #

##

# # # ##

Page 19: Vector Quantization and Reduced Modelssitus.biomachina.org/sd03/willy2.pdf · Vector Quantization and Reduced Models Willy Wriggers, Ph.D. ... Threshold level can be optimized via

18

• kernel function U(r) is principal solution of biharmonic equation that arises inelasticity theory of thin plates:

• variational principle: U(r) minimizes the bending energy (not shown).• 1D: U(r) = |r3| (cubic spline)• 2D: U(r) = r2 log r2

• 3D: U(r) = |r|

2D: U(r)

• we are interested mainly in 3D case but will also consider 2D (differentiable).

Bookstein “Thin-Plate” Splines

2 4( ) ( ) δ( ).U r U r r∆ = ∇ =

( , )x yF

Interpol RNAP 1

Taq RNAP x-tal structure

Page 20: Vector Quantization and Reduced Modelssitus.biomachina.org/sd03/willy2.pdf · Vector Quantization and Reduced Models Willy Wriggers, Ph.D. ... Threshold level can be optimized via

19

Interpol RNAP 2

Flexibly fitted (MD) structure

Interpol RNAP 3

Piecewise-linear inter- / extrapolation

Page 21: Vector Quantization and Reduced Modelssitus.biomachina.org/sd03/willy2.pdf · Vector Quantization and Reduced Models Willy Wriggers, Ph.D. ... Threshold level can be optimized via

20

Interpol RNAP 4

Thin-plate splines, 2D (r2 log r2) kernel

Interpol RNAP 5

Thin-plate splines, 3D |r| kernel

Page 22: Vector Quantization and Reduced Modelssitus.biomachina.org/sd03/willy2.pdf · Vector Quantization and Reduced Models Willy Wriggers, Ph.D. ... Threshold level can be optimized via

21

Summary

Reduced (vector quantization) representations are useful for a varietyof applications:

• Rigid-body docking.• (Fast computation of forces and torques for haptic devices - S. Birmanns).• Flexible fitting with molecular dynamics.• Estimation of displacement vector fields.

(Non-linear) Interpolation is a viable alternative to MD in flexible fittingif stereochemical quality is optimized after morphing.

Interpolation allows displacements of vectors to be interpolated to fullspace, useful in Normal Modes Analysis (F. Tama, P. Chacón).

Availability: Situs 2.2

Acknowledgements

Pablo Chacón, Julio Kovacs, Stefan Birmanns.

Klaus Schulten, UIUCRon Milligan, TSRIEdward H. Egelman, U VirginiaJoachim Frank, Wadsworth CenterSeth Darst, Rockefeller U

La Jolla Interfaces in Science ProgramGrants from NIH to W.W. and Charles Brooks, III (MMTSB)