requiring two weeks on 300 cores to process a …...requiring two weeks on 300 cores to process a...

requiring two weeks on 300 cores to process a dataset with

200,000 images.

We introduce a framework for Cryo-EM density estima-

tion, formulating the problem as one of stochastic optimiza-

tion to perform maximum-a-posteriori (MAP) estimation in

a probabilistic model. The approach is remarkably efficient,

providing useful low resolution density estimates in an hour.

We also show that our stochastic optimization technique is

insensitive to initialization, allowing the use of random ini-

tializations. We further introduce a novel importance sam-

pling scheme that dramatically reduces the computational

costs associated with high resolution reconstruction. This

leads to speedups of 100,000-fold or more, allowing struc-

tures to be determined in a day on a modern workstation. In

addition, the proposed framework is flexible, allowing parts

of the model to be changed and improved without impacting

the estimation; e.g., we compare the use of three different

priors. To demonstrate our method, we perform reconstruc-

tions on two real datasets and one synthetic dataset.

2. Background and Related Work

In Cryo-EM, a purified solution of the target molecule

is cryogenically frozen into a thin (single molecule thick)

film, and imaged with a transmission electron microscope.

A large number of such samples are obtained, each of which

provides a micrograph containing hundreds of visible, in-

dividual molecules. In a process known as particle pick-

ing, individual molecules are selected, resulting in a stack

of cropped particle images. Particle picking is often done

manually, however there have been recent moves to partially

or fully automate the process [17, 40]. Each particle image

provides a noisy view of the molecule, but with unknown

3D pose, see Fig. 2 (right). The reconstruction task is to es-

timate the 3D electron density of the target molecule from

the potentially large set of particle images.

Common approaches to Cryo-EM density estimation,

e.g., [7, 11, 37], use a form of iterative refinement. Based

on an initial estimate of the 3D density, they determine the

best matching pose for each particle image. A new density

estimate is then constructed using the Fourier Slice The-

orem (FST); i.e., the 2D Fourier transform of an integral

projection of the density corresponds to a slice through the

origin of the 3D Fourier transform of that density, in a plane

perpendicular to the projection direction [13]. Using the

3D pose for each particle image, the new density is found

through interpolation and averaging of the observed particle

images.

This approach is fundamentally limited in several ways.

Even if one begins with the correct 3D density, the low SNR

of particle images makes accurately identifying the correct

pose for each particle nearly impossible. This problem is

exacerbated when the initial density is inaccurate. Poor

initializations result in estimated structures that are either

clearly wrong (see Fig. 9) or, worse, appear plausible but

are misleading in reality, resulting in incorrectly estimated

3D structures [12]. Finally, and crucially for the case of

density estimation with many particle images, all data are

used at each refinement iteration, causing these methods to

be extremely slow. Mallick et al. [25] proposed an approach

which attempted to establish weak constraints on the rela-

tive 3D poses between different particle images. This was

used to initialize an iterative refinement algorithm to pro-

duce a final reconstruction. In contrast, our refinement ap-

proach does not require an accurate initialization.

To avoid the need to estimate a single 3D pose for each

particle image, Bayesian approaches have been proposed in

which the 3D poses for the particle images are treated as la-

tent variables, and then marginalized out numerically. This

approach was originally proposed by Sigworth [35] for 2D

image alignment and later by Scheres et al. [33] for 3D es-

timation and classification. It was since been used by Jaitly

et al. [14], where batch, gradient-based optimization was

performed. Nevertheless, due to the computational cost of

marginalization, the method was only applied to small num-

bers of class-average images which are produced by cluster-

ing, aligning and averaging individual particle images ac-

cording to their appearance, to reduce noise and the number

of particle images used during the optimization. More re-

cently, pose marginalization was applied directly with par-

ticle images, using a batch Expectation-Maximization algo-

rithm in the RELION package [34]. However, this approach

is extremely computationally expensive. Here, the proposed

approach uses a similar marginalized likelihood, however

unlike previous methods, stochastic rather than batch op-

timization is used. We show that this allows for efficient

optimization, and for robustness with respect to initializa-

tion. We further introduce a novel importance sampling

technique that dramatically reduces the computational cost

of the marginalization when working at high resolutions.

3. A Framework for 3D Density Estimation

Here we present our framework for density estimation

which includes a probabilistic generative model of image

formation, stochastic optimization to cope with large-scale

datasets, and importance sampling to efficiently marginalize

over the unknown 3D pose of the particle in each image.

3.1. Image Formation Model

In Cryo-EM, particle images are formed as orthographic,

integral projections of the electron density of a molecule,

V ∈ RD3

. In each image, the density is oriented in an un-

known pose, R ∈ SO(3), relative to the direction of the

microscope beam. The projection along this unknown di-

rection is a linear operator, which is represented by the ma-

trix PR ∈ RD2×D3

. Along with pose, the in-plane transla-

tion t ∈ R2 of each particle image is unknown, the effect of

(likelihood) then becomes

p(I|θ, V) ≈

MR∑

j=1

wR

j

Mt∑

ℓ=1

wt

kp(I|θ,Rj , tℓ, V)p(R)p(t)

(4)

where {(Rj , wR

j )}MR

j=1are weighted quadrature points over

SO(3) and {(tℓ, wt

ℓ)}Mt

ℓ=1are weighted quadrature points

over R2. The accuracy of the quadrature scheme, and con-

sequently the values of MR and Mt, are set automatically

based on ω, the specified maximum frequency such that

higher values of ω results in more quadrature points.

Given a set of K images with CTF parameters D ={(Ii, θi)}

Ki=1 and assuming conditional independence of the

images, the posterior probability of a density V is

p(V|D) ∝ p(V)

K∏

i=1

p(Ii|θi, V) (5)

where p(V) is a prior over 3D molecular electron densities.

Several choices of prior are explored below, but we found

that a simple independent exponential prior worked well.

Specifically, p(V) =∏D3

i=1λe−λVi where Vi is the value

of the ith voxel and λ is the inverse scale parameter. Other

choices of prior are possible and is a promising direction for

future research.

Estimating the density now corresponds to finding Vwhich maximizes Equation (5). Taking the negative log

and dropping constant factors, the optimization problem be-

comes argminV∈R

D3

+

f(V),

f(V) = − log p(V)−K∑

i=1

log p(Ii|θi, V) (6)

where V is restricted to be positive (negative density is phys-

ically unrealistic). Optimizing Eq. (6) directly is costly due

to the marginalization in Eq. (4) as well as the large num-

ber (K) of particle images in a typical dataset. To deal with

these challenges, the following sections propose the use of

two techniques, namely, stochastic optimization and impor-

tance sampling.

3.2. Stochastic Optimization

In order to efficiently cope with the large number of

particle images in a typical dataset, we propose the use

of stochastic optimization methods. Stochastic optimiza-

tion methods exploit the large amount of redundancy in

most datasets by only considering subsets of data (i.e., im-

ages) at each iteration by rewriting the objective as f(V) =∑

k fk(V) where each fk(V) evaluates a subset of data.

This allows for fast progress to be made before a batch op-

timization algorithm would be able to take a single step.

There are a wide range of such methods, ranging from

simple stochastic gradient descent with momentum [28, 29,

36] to more complex methods such as Natural Gradient

methods [2, 3, 19, 20] and Hessian-free optimization [26].

Here we propose the use of Stochastic Average Gradient

Descent (SAGD) [21] which has several important advan-

tages. First, it is effectively self-tuning, using a line-search

to determine and adapt the learning rate. This is particu-

larly important, as many methods require significant man-

ual tuning for new objective functions and, potentially, each

new dataset. Further, it is specifically designed for the finite

dataset case allowing for faster convergence.

At each iteration τ , SAGD [21] considers only a single

subset of data, kτ , which defines part of the objective func-

tion fkτ(V) and its gradient gkτ

(V). The density V is then

updated as

Vτ+1 = Vτ −ǫ

KL

K∑

j=1

dVτj (7)

where ǫ is a base learning rate, L is a Lipschitz constant of

gk(V), and

dVτk =

{

gk(Vτ ) k = kτ

dVτ−1

k otherwise(8)

is the most recent gradient evaluation of datapoint j at it-

eration τ . This step can be computed efficiently by stor-

ing the gradient of each observation and updating a run-

ning sum each time a new gradient is seen. The Lipschitz

constant L is not generally known but can be estimated us-

ing a line-search technique. Theoretically, convergence oc-

curs for values of ǫ ≤ 1

16[21], however in practice larger

values at early iterations can be beneficial, thus we use

ǫ = max( 1

16, 21−⌊τ/150⌋). To allow parallelization and re-

duce the memory requires of SAGD, the data is divided into

minibatches of 200 particles images. Finally, to enforce the

positivity of density, negative values of V are truncated to

zero after each iteration. More details of the stochastic op-

timization can be found in the Supplemental Material.

3.3. Importance Sampling

While stochastic optimization allows us to scale to large

datasets, the cost of computing the required gradient for

each image remains high due to the marginalization over

orientations and shifts. Intuitively, one could consider ran-

domly selecting a subset of the terms in Eq. (4) and using

this as an approximation. This idea is formalized by impor-

tance sampling (IS) which allows for an efficient and accu-

rate approximation of the discrete sums in Eq. (4).1 A full

review of importance sampling is beyond the scope of this

paper but we refer readers to [38].

1One can also apply importance sampling directly to the continuous

integrals in Eq. (3) but it can be computationally advantageous to precom-

pute a fixed set of projection and shift matrices, PR and St, which can be

reused across particle images.

References

[1] S. Agarwal, N. Snavely, I. Simon, S. M. Seitz, and

R. Szeliski, “Building rome in a day,” in ICCV, 2009. 1

[2] S.-I. Amari, H. Park, and K. Fukumizu, “Adaptive method

of realizing natural gradient learning for multilayer percep-

trons,” Neural Computation, vol. 12, no. 6, pp. 1399–1409,

2000. 4

[3] S. Amari, “Natural gradient works efficiently in learning,”

Neural Computation, vol. 10, no. 2, pp. 251–276, 1998. 4

[4] W. T. Baxter, R. A. Grassucci, H. Gao, and J. Frank, “Deter-

mination of signal-to-noise ratios and spectral snrs in cryo-

em low-dose imaging of molecules,” J Struct Biol, vol. 166,

no. 2, pp. 126–32, May 2009. 1

[5] J. Besag, “Statistical analysis of non-lattice data,” Journal

of the Royal Statistical Society. Series D (The Statistician),

vol. 24, no. 3, pp. 179–195, 1975. 7

[6] R. Bhotika, D. Fleet, and K. Kutulakos, “A probabilistic the-

ory of occupancy and emptiness,” in ECCV, 2002. 1

[7] J. de la Rosa-Trevın, J. Oton, R. Marabini, A. Zaldıvar,

J. Vargas, J. Carazo, and C. Sorzano, “Xmipp 3.0: An im-

proved software suite for image processing in electron mi-

croscopy,” Journal of Structural Biology, vol. 184, no. 2, pp.

321–328, 2013. 1, 2

[8] A. Doucet, S. Godsill, and C. Andrieu, “On sequential Monte

Carlo sampling methods for Bayesian filtering,” Statistics

and Computing, vol. 10, no. 3, pp. 197–208, 2000. 5

[9] M. Graf and D. Potts, “Sampling sets and quadrature formu-

lae on the rotation group,” Numerical Functional Analysis

and Optimization, vol. 30, no. 7-8, pp. 665–688, 2009. 3

[10] J. Gregson, M. Krimerman, M. B. Hullin, and W. Heidrich,

“Stochastic tomography and its applications in 3d imaging

of mixing fluids,” ACM Trans. Graph. (Proc. SIGGRAPH

2012), vol. 31, no. 4, pp. 52:1–52:10, 2012. 1

[11] N. Grigorieff, “Frealign: high-resolution refinement of sin-

gle particle structures,” J Struct Biol, vol. 157, no. 1, p.

117–125, Jan 2007. 1, 2, 7

[12] R. Henderson, A. Sali, M. L. Baker, B. Carragher, B. De-

vkota, K. H. Downing, E. H. Egelman, Z. Feng, J. Frank,

N. Grigorieff, W. Jiang, S. J. Ludtke, O. Medalia, P. A.

Penczek, P. B. Rosenthal, M. G. Rossmann, M. F. Schmid,

G. F. Schroder, A. C. Steven, D. L. Stokes, J. D. Westbrook,

W. Wriggers, H. Yang, J. Young, H. M. Berman, W. Chiu,

G. J. Kleywegt, and C. L. Lawson, “Outcome of the first

electron microscopy validation task force meeting,” Struc-

ture, vol. 20, no. 2, pp. 205 – 214, 2012. 1, 2, 7

[13] J. Hsieh, Computed Tomography: Principles, Design, Arti-

facts, and Recent Advances. SPIE, 2003. 1, 2

[14] N. Jaitly, M. A. Brubaker, J. Rubinstein, and R. H. Lilien, “A

Bayesian Method for 3-D Macromolecular Structure Infer-

ence using Class Average Images from Single Particle Elec-

tron Microscopy,” Bioinformatics, vol. 26, pp. 2406–2415,

2010. 2

[15] J. Keeler, Understanding NMR Spectroscopy. Wiley, 2010.

1

[16] K. N. Kutulakos and S. M. Seitz, “A theory of shape by space

carving,” IJCV, vol. 38, no. 3, pp. 199–218, 2000. 1

[17] R. Langlois, J. Pallesen, J. T. Ash, D. N. Ho, J. L. Rubin-

stein, and J. Frank, “Automated particle picking for low-

contrast macromolecules in cryo-electron microscopy,” Jour-

nal of Structural Biology, vol. 186, no. 1, pp. 1 – 7, 2014. 2

[18] W. C. Y. Lau and J. L. Rubinstein, “Subnanometre-resolution

structure of the intact Thermus thermophilus H+-driven ATP

synthase,” Nature, vol. 481, pp. 214–218, 2012. 3, 5, 6

[19] N. Le Roux and A. Fitzgibbon, “A fast natural Newton

method,” in ICML, 2010. 4

[20] N. Le Roux, P.-A. Manzagol, and Y. Bengio, “Topmoumoute

online natural gradient algorithm,” in NIPS, 2008, pp. 849–

856. 4

[21] N. Le Roux, M. Schmidt, and F. Bach, “A stochastic gradi-

ent method with an exponential convergence rate for strongly

convex optimization with finite training sets,” in NIPS, 2012.

4

[22] V. J. Lebedev and D. N. Laikov, “A quadrature formula for

the sphere of the 131st algebraic order of accuracy,” Doklady

Mathematics, vol. 59, no. 3, pp. 477 – 481, 1999. 3

[23] X. Li, P. Mooney, S. Zheng, C. R. Booth, M. B. Braunfeld,

S. Gubbens, D. A. Agard, and Y. Cheng, “Electron count-

ing and beam-induced motion correction enable near-atomic-

resolution single-particle cryo-em,” Nature Methods, vol. 10,

no. 6, pp. 584–590, 2013. 8

[24] S. P. Mallick, B. Carragher, C. S. Potter, and D. J. Kriegman,

“Ace: Automated {CTF} estimation,” Ultramicroscopy, vol.

104, no. 1, pp. 8–29, 2005. 3

[25] S. P. Mallick, S. Agarwal, D. J. Kriegman, S. J. Belongie,

B. Carragher, and C. S. Potter, “Structure and view estima-

tion for tomographic reconstruction: A bayesian approach,”

in CVPR, 2006. 2

[26] J. Martens, “Deep learning via hessian-free optimization,” in

ICML, 2010. 4

[27] J. A. Mindell and N. Grigorieff, “Accurate determination

of local defocus and specimen tilt in electron microscopy,”

Journal of Structural Biology, vol. 142, no. 3, pp. 334–47,

2003. 3

[28] Y. Nesterov, “A method of solving a convex programming

problem with convergence rate o (1/k2),” Soviet Mathematics

Doklady, vol. 27, no. 2, pp. 372–376, 1983. 4

[29] B. Polyak, “Some methods of speeding up the convergence

of iteration methods,” USSR Computational Mathematics

and Mathematical Physics, vol. 4, no. 5, pp. 1–17, 1964. 4

[30] L. Reimer and H. Kohl, Transmission Electron Microscopy:

Physics of Image Formation. Springer, 2008. 3

[31] J. L. Rubinstein, J. E. Walker, and R. Henderson, “Struc-

ture of the mitochondrial atp synthase by electron cryomi-

croscopy,” The EMBO Journal, vol. 22, no. 23, pp. 6182–

6192, 2003. 5, 6, 7

[32] B. Rupp, (2009). Biomolecular Crystallography: Principles,

Practice and Application to Structural Biology. Garland

Science, 2009. 1

[33] S. H. W. Scheres, H. Gao, M. Valle, G. T. Herman, P. P. B.

Eggermont, J. Frank, and J.-M. Carazo, “Disentangling con-

formational states of macromolecules in 3d-em through like-

lihood optimization,” Nature Methods, vol. 4, pp. 27–29,

2007. 2, 3

[34] S. H. Scheres, “RELION: Implementation of a Bayesian

approach to cryo-EM structure determination ,” Journal of

Structural Biology, vol. 180, no. 3, pp. 519 – 530, 2012. 1,

2, 7

[35] F. Sigworth, “A maximum-likelihood approach to single-

particle image refinement,” Journal of Structural Biology,

vol. 122, no. 3, pp. 328 – 339, 1998. 2, 3

[36] I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the im-

portance of initialization and momentum in deep learning,”

in ICML, 2013. 4

[37] G. Tang, L. Peng, P. Baldwin, D. Mann, W. Jiang, I. Rees,

and S. Ludtke, “EMAN2: an extensible image processing

suite for electron microscopy,” Journal of Structural Biology,

vol. 157, no. 1, pp. 38–46, 2007. 1, 2

[38] S. T. Tokdar and R. E. Kass, “Importance sampling: a

review,” Wiley Interdisciplinary Reviews: Computational

Statistics, vol. 2, no. 1, pp. 54–60, 2010. 4

[39] Z. Xu, A. L. Horwich, and P. B. Sigler, “The crystal structure

of the asymmetric GroEL-GroES-(ADP)7 chaperonin com-

plex,” Nature, vol. 388, pp. 741–750, 1997. 5, 6

[40] J. Zhao, M. A. Brubaker, and J. L. Rubinstein, “TMaCS:

A hybrid template matching and classification system for

partially-automated particle selection,” Journal of Structural

Biology, vol. 181, no. 3, pp. 234 – 242, 2013. 2

Acknowledgements This work was supported in part by

NSERC Canada and the CIFAR NCAP Program. MAB was

funded in part by an NSERC Postdoctoral Fellowship. The au-

thors would like thank John L. Rubinstein for providing data and

invaluable feedback.

requiring two weeks on 300 cores to process a …...requiring two weeks on 300 cores to process a...

Documents