principal component analysis in md simulation speaker: zhou chen-yang supervisor: wu yun-dong

19
Principal Component Analysis in MD Simulation Speaker: ZHOU Chen-Yang Supervisor: Wu Yun-Dong

Upload: jared-harvey

Post on 17-Dec-2015

231 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Principal Component Analysis in MD Simulation Speaker: ZHOU Chen-Yang Supervisor: Wu Yun-Dong

Principal Component Analysis in MD Simulation

Speaker: ZHOU Chen-Yang

Supervisor: Wu Yun-Dong

Page 2: Principal Component Analysis in MD Simulation Speaker: ZHOU Chen-Yang Supervisor: Wu Yun-Dong

Methods to analyze MD trajectory

• Intuition-based coordinates– RMSD with respect to native state– Fraction of native contacts – Radius of gyration– Other observables

• Advantage– Easy to understand– Convenient to do

• Disadvantage– Inaccurate– Ineffecctive for non-native structures, or without good

reference structure– Depend on previous knowledge

Page 3: Principal Component Analysis in MD Simulation Speaker: ZHOU Chen-Yang Supervisor: Wu Yun-Dong

How to measure conformational change?

What we have to do:

• Reduce dimension • Trajectory is too complicated• Good projection should be able to seperat of noise and signal

• Classification/Clustering• Classify structures to different states

• Algorithms include:• PCA: Principal Component Analysis• MDS: Multi-Dimensional Scaling

If we already have optimal reaction coordinate

Then we have: free energy landscape,

transition pathway, transition rate ...

But usually we don't, and it doesn't come up automatically

Page 4: Principal Component Analysis in MD Simulation Speaker: ZHOU Chen-Yang Supervisor: Wu Yun-Dong

dPCA vs RMSD

The figure represents the free energy landscape of Trp-zip2 at 300K, using Amber force field 99sb*-ildn. Projected to 2nd principal component and RMSD.

Page 5: Principal Component Analysis in MD Simulation Speaker: ZHOU Chen-Yang Supervisor: Wu Yun-Dong

Genaral description of PCA

• The central idea of PCA is to:– reduce the dimension

– retain the variation

• An example:– (x,y) is a randomly generated

dataset• var(x) = 3.2, var(y) = 2.3

– (x,y) is either centered at (0,0) or at (3,3), which are mixed

– PCA generates new coordinate (x',y'), and x' captures most of the variation

• var(x') = 5.5, var(y') = 0.99

Page 6: Principal Component Analysis in MD Simulation Speaker: ZHOU Chen-Yang Supervisor: Wu Yun-Dong

Key question understanding PCA

• In practice, the principal components (PCs) are some linear combination of original coordinates.

• Suppose we have a set of data containing 2 columns X1 and X2. Now we generate a new column of data Z=a1X1+a2X2, what is the variance of Z?

Variance and covarianceExample: Z=X1+X2

Why is it important? Because we are going to project the data set to a new coordinate Z, and our attemp is to choose a (a1, a2) to maximize the variance of Z.

Page 7: Principal Component Analysis in MD Simulation Speaker: ZHOU Chen-Yang Supervisor: Wu Yun-Dong

Z=a1X1+a2X2:

Represented with matrix multiplication:

Covariance Matrix: Σ Coefficients of original

coordinate in PC, α

var(Z)=Var(αX)=α'Σα

Next step: change ato search the maximum of var(Z)

Z=X1+X2:

Page 8: Principal Component Analysis in MD Simulation Speaker: ZHOU Chen-Yang Supervisor: Wu Yun-Dong

Maximize var(Z)

First, we have to normalize a:

Then, maximize var(Z) is to maximize

Differentiate with respect to a1

l is the eigen value and a1 is the corresponding eigen vector of S

eigen value ploted from large to small

Pick first several eigen vector as PC, or actually the coefficient of PCs. Then project data to PCs, and the simplified data could be further analyzed with orther techniques such as clustering.

Page 9: Principal Component Analysis in MD Simulation Speaker: ZHOU Chen-Yang Supervisor: Wu Yun-Dong

PCA in application: Cartesian coordinates

• Cartesian coordinates contain all the imformation

• But often noisy

cPCA: cartesian PCAuse cartesian coordinate

Mu, Y., Nguyen, P. H., & Stock, G. (2005). Proteins, 58(1), 45–52.

Dashed blue line: Cartesian PCA

Comparison of cPCA and dPCA in the analysis of Ala7 MD simulation

Full red line: PCA using dihedral angle

Page 10: Principal Component Analysis in MD Simulation Speaker: ZHOU Chen-Yang Supervisor: Wu Yun-Dong

PCA in application: cPCA, dPCA and pPCA

Advangtage: 1. reduction of dimensionality2. constraint within coordinateProblem with dihedral: 1. dihedral angle is periodic 2. dihedral angle is not linear

In application, people transform dihedral angle to its sin/cos values to do PCA, called dPCA

Page 11: Principal Component Analysis in MD Simulation Speaker: ZHOU Chen-Yang Supervisor: Wu Yun-Dong

Application of dPCA: (Ab16-22)6

Nguyen, P. H., Li, M. S., Stock, G., Straub, J. E., & Thirumalai, D. (2007). PNAS, 104(1), 111–6.

Free-energy diagram projected onto the first two principal components V1 and V2 of the dPCA forthe hexamer.

Page 12: Principal Component Analysis in MD Simulation Speaker: ZHOU Chen-Yang Supervisor: Wu Yun-Dong

dPCA in RNA analysis: flexible choice of internal coordinates

Riccardi, L., Nguyen, P. H., & Stock, G. (2009). JPCB, 113(52), 16660–8.

Page 13: Principal Component Analysis in MD Simulation Speaker: ZHOU Chen-Yang Supervisor: Wu Yun-Dong

• REMD simulation of a short b-hairpin Trp-zip2 using:– ff99sb-ildn– ff99sb*-ildn– ff99sb-ildn-nmr– ff99C, our modified version of ff99sb-ildn

Using dPCA to compare Trp-zip2 potential energy surface in different force field

Page 14: Principal Component Analysis in MD Simulation Speaker: ZHOU Chen-Yang Supervisor: Wu Yun-Dong

Using dPCA to compare Trp-zip2 potential energy surface in different force field

Free energy landscape of Trp-zip2 at 300K, using Amber force field 99sb*-ildn. Projected to 1st and 2nd principal component, using dPCA of turn region. The reason for the extended energy surface is that it cannot form stable hairpin.

Native like turn

Helical structure

Page 15: Principal Component Analysis in MD Simulation Speaker: ZHOU Chen-Yang Supervisor: Wu Yun-Dong

Using dPCA to compare Trp-zip2 potential energy surface in different force field

The figure represents the free energy landscape of Trp-zip2 at 300K, using Amber force field 99sb-ildn. Projected to 1st and 2nd principal component of 99sb*-ildn, using dPCA of turn region.

Native like turn

Helical structure

Page 16: Principal Component Analysis in MD Simulation Speaker: ZHOU Chen-Yang Supervisor: Wu Yun-Dong

Using dPCA to compare Trp-zip2 potential energy surface in different force field

The figure represents the free energy landscape of Trp-zip2 at 300K, using Amber force field 99sb-ildn-nmr. Projected to 1st and 2nd principal component of 99sb*-ildn, using dPCA of turn region. 99sb-ildn-nmr cannot fold the Trp-zip2 hairpin.

Native like turn

Helical structure

Page 17: Principal Component Analysis in MD Simulation Speaker: ZHOU Chen-Yang Supervisor: Wu Yun-Dong

The figure represents the free energy landscape of Trp-zip2 at 300K, using force field 99C. Projected to 1st and 2nd principal component of 99sb*-ildn, using dPCA of turn region. In our force field, Trp-zip2 form stable beta-turn so that it rarely sample other conformation.

Using dPCA to compare Trp-zip2 potential energy surface in different force field

Native like turn

Page 18: Principal Component Analysis in MD Simulation Speaker: ZHOU Chen-Yang Supervisor: Wu Yun-Dong

Summary

• PCA is a linear transformation of old coordinates to capture maximum variance

• Instead of using Cartesian coordinates, dihedral angles could be a better choice in description of conformational change

• General coordinates or a subset of coordinates (for region of interest) can be used for PCA analysis

• The result of PCA could used for further analysis such as clustering and transition rate calculation.

Page 19: Principal Component Analysis in MD Simulation Speaker: ZHOU Chen-Yang Supervisor: Wu Yun-Dong

Thank you!Thank you!