deep learningfor single-shotautofocus microscopy: … · 2019-06-04 · 1 led 2 leds 3 leds 4 leds...

Deep learning for single-shot autofocus microscopy:supplementary material

HENRY PINKARD1,2,3,4*, ZACHARY PHILLIPS5, ARMAN BABAKHANI6, DANIEL A.FLETCHER7,8, AND LAURA WALLER2,3,5,81Computational Biology Graduate Group, University of California, Berkeley, CA 94720, USA2Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720, USA3Berkeley Institute for Data Science, Berkeley, CA 94720, USA4University of California San Francisco Bakar Computational Health Sciences Institute, San Francisco, CA 94158, USA5Graduate Group in Applied Science and Technology, University of California, Berkeley, CA 94720, USA6Department of Physics, University of California, Berkeley, CA 94720, USA7Department of Bioengineering and Biophysics Program, University of California, Berkeley, CA 94720, USA8Chan Zuckerberg Biohub, San Francisco, CA 94158, USA*Corresponding author: [email protected]

Published 5 June 2019

This document provides supplementary information to “Deep learning for single-shot autofo-cus microscopy,” https://doi.org/10.1364/OPTICA.6.000794. It contains information on how to implement our technique on new microscopes, including additional experiments testing which illumination patterns perform best, as well as detailed information about the neural network architecture used and how it compares to convolutional neural network architectures.

NOTE S1: PRACTICAL ASPECTS OF IMPLEMENTINGON A NEW MICROSCOPE

Hardware/Illumination In order to generate data using ourmethod, the microscope must be able to image samples withtwo different contrast modalities: one with spatially incoherentillumination for computing ground truth focal position from im-age blur, and a second coherent or nearly coherent illumination(i.e. one or a few LEDs) as input to the neural network. The inco-herent illumination can be accomplished with the regular bright-field illumination of a transmitted light microscope in the caseof samples that absorb light. In the case of transparent phase-only samples (like the ones used in our experiments), incoherentphase contrast can be created by using any asymmetric illumina-tion pattern. We achieved this by using a half-annulus patternon a programmable LED illuminator, but this specific pattern isnot necessary. The same effect can be achieved by blocking outhalf of the illumination aperture of a microscope condenser witha piece of cardboard[1] or other means of achieving asymmetricillumination. The asymmetric incoherent illumination is onlyneeded for the generation of training data, so it does not need tobe permanent.

For the spatially coherent illumination, a single LED pointed

at the sample from an oblique angle (i.e. not directly above) gen-erates sufficient contrast, as shown in the main paper. However,our experiments with different multi-LED patterns (see note S2)indicate that a series of LEDs arranged in a line might be evenbetter for this purpose.

Software Our implementation used a stack of open source ac-quisition control software based on Micro-Manager [2] and theplugin for high-throughput microscopy, Micro-Magellan [3].Both are agnostic to specific hardware, and can thus be im-plemented on any microscope to easily collect training data.Automated LED illumination in Micro-Manager can be con-figured using a simple circuit connected to an Arduino andthe Micro-Manager device adapter to control digital IO. Largenumbers of focal stacks can be collected in an automated wayusing the 3D imaging capabilities of Micro-Magellan, and aPython reader for Micro-Magellan data allows for easy integra-tion of data into deep learning frameworks. Examples of thiscan be seen in the Jupyter notebook: 1. H. Pinkard, "Single-shot autofocus microscopy using deep learning–code," (2019),https://doi.org/10.6084/m9.figshare.7453436.v1.

https://micro-manager.org/wiki/Arduino#Digital_IO

https://github.com/henrypinkard/Micro-Magellan/tree/master/src/main/python

https://doi.org/10.1364/OPTICA.6.000794

Supplementary Material 2

Other imaging geometries Although we have demonstrated thistechnique on a transmitted light microscope with LED illumi-nation, in theory there is no reason why it couldn’t be appliedto other coherent illuminations and geometries. For example,using a laser instead of an LED as a coherent illumination sourceshould be possible with minimal modification. We’ve alsodemonstrated the technique using relatively thin samples. Auto-focusing methods like ours are generally not directly applicableto thick samples, since it is difficult to define the ground truthfocal plane of a thick sample in a transmitted light configuration.However, in principle it is possible that these methods could beused in a reflected light geometry, where the "true" focal planecorresponds to the top of the sample.

NOTE S2: CHOOSING AN ILLUMINATION PATTERN

Although the network is capable of learning to predict defocusfrom images taken under the illumination of a single off-axisLED, as shown in the main paper, different angles or combi-nations of angles of illumination might contain more usefulinformation for prediction. Better performance can make theprediction task more accurate, easier and able to be learnedwith less training data. Since our experimental setup uses a pro-grammable LED array quasi-dome as an illumination source [4],we can choose the source patterns at will to test this. First,restricting the analysis to one LED at a time, we tested howthe angle of the single-LED illumination affects performance(Fig. S1a). We found that performance improves with increasingangle of illumination, up to a point where performance rapidlydegrades. This drop-off occurs in the ’darkfield’ region (wherethe illumination angle is larger than the objective’s NA), likelydue to the low signal-to-noise ratio (SNR) of the higher-angledarkfield images (see inset images in Fig. S1a). This drop in SNRcould plausibly be caused by either a decrease in the number ofphotons hitting the sample from higher-angle LEDs, or a dropin the content of the sample itself at higher frequencies. To ruleout the first possibility, we compensated for the expected num-ber of photons incident on a unit area of the sample, which isexpected to fall off approximately proportional to 1

cos(θ) , whereθ is the angle of illumination relative to the optical axis [5]. Thedataset used here increases exposure time in proportion to cos(θ)in order to compensate for this. Thus, the degradation of per-formance at high angles is most likely due to the amount ofhigh frequency content in the sample itself at these angles andtherefore might be sample-specific.

Next, we tested 18 different single or multi-LED source pat-terns chosen from within the distribution of x and y axis-alignedLEDs available on our quasi-dome (Fig. S1b,c). Since the lightfrom any two LEDs is mutually incoherent, single-LED imagescan be added digitally to synthesize the image that would havebeen produced with multiple-LED illumination. This enabledus to computationally experiment with different illuminationtypes on the same sample. Figure S1c shows the defocus pre-diction performance of various patterns of illumination. Thebest performing patterns were those that contained multipleLEDs arranged in a line. Given that specific parts of the Fouriertransform contain important information for defocus predictionand that these areas will move to different parts of Fourier spacewith different angles of illumination, we speculate that the lineof LEDs helps to spread relevant information for defocus predic-tion into different parts of the spectrum. Although this analysisdemonstrates more and higher angle LED patterns seem to yieldsuperior performance, there are potential caveats: In the former

Single LED illumination(a) (b)

(c)

2

4

6

Va

lida

tio

n R

MS

E (

µm

)

1 LED

2 LEDs

3 LEDs

4 LEDs

9 LEDs

18 LEDs

1 0.5 0 0.5 1

Angle-x (NA)

1

0.5

0

0.5

1

An

gle

-y (

NA

)

LED on

Brightfield LEDs

LED off

1

3

5

7

0.2 0.4 0.6 0.8Illumination angle

(numerical aperture)

0

2

4

6

8

10

12

14

16

Va

lida

tio

n R

MS

E (

µm

)

LED array diagram

Mutliple LED illumination

Objective

NA

Fig. S1. Illumination design. a) Increasing the numerical aper-ture (NA) (i.e. angle relative to the optical axis) of single-LEDillumination increases the accuracy of defocus predictions, upto a point at which it degrades. b) Diagram of LED placementsin NA space for our LED quasi-dome. c) Defocus predictionperformance for different illumination patterns. Patterns withmultiple LEDs in an asymmetric line show the lowest error.

case, it could fail to hold when applied to a denser sample (i.e.not a sparse distribution of cells). In the latter, there is the costof the increase in exposure time needed to acquire such images.

NOTE S3: FULLY CONNECTED FOURIER NEURAL NET-WORK ARCHITECTURE

The fully connected Fourier neural network (FCFNN) beginswith a single coherent intensity image captured by the micro-scope. This image is Fourier transformed, and the magnitudeof the complex-valued pixels in the central part of the Fouriertransform are reshaped into a single vector. The useful part ofthe Fourier transform is directly related to the angle of coherentillumination (Fig. S2). A coherent illumination source such as anLED that is within the range of brightfield angles for the givenobjective (i.e. at an angle less than the maximum captured angleas determined the objective’s NA) will display a characteristic2-circle structure in its Fourier transform magnitude. The twocircles contain information corresponding to the singly-scatteredlight from the sample and move farther apart as the angle of theillumination increases. The neural network input should consistof half of the pixels in which these circles lie, because as thesaliency map in Fig. 4b of the main text demonstrates, they con-tain the useful information for predicting defocus. Only half thepixels are needed because the Fourier transform of a real-valuedinput (i.e. an intensity image) has symmetric magnitudes, so theother half contain redundant information. These circles movewith changing illumination angle, so they angle of illuminationand relevant pixels must be selected together.

After cropping out the relevant pixels and reshaping theminto a vector, the vector is normalized to have unit mean in


Objective

lens

Objective

lens

Slightly o�-axis

Far o�-axis

Fourier transformsRegions to use as neural

network input

Objective

lens

network input

Fig. S2. Fourier Transform regions to use as network input.Off-axis illumination with a coherent point source at an anglewithin the numerical aperture of the collection objective pro-duces a characteristic two-circle structure in the log magnitudeof the Fourier transform of the captured image. As the angle ofillumination increases, these circles move further apart. Infor-mation about single-scattering events is confined within thesecircles. The blue regions represent the pixels that should becropped out and fed into the neural network architecture.

order to account for differences in illumination brightness, andit is then used as the input layer of a neural network trained inTensorFlow [6]. The learnable part of the FCFNN consists of aseries of small (100 unit) fully connected layers, followed by asingle scalar output (the defocus prediction).

We experimented with several hyperparameters and regular-ization methods to improve performance on our training data.The most successful of these were: 1) Changing the numberand width of the fully connected layers. We started small andincreased both until this ceased to improve performance, whichoccurred with 10 fully connected layers of 100 units each. 2)Applying dropout [7] to the vectorized Fourier transform in-put layer (but not other layers) to prevent overfitting to specificparts of the Fourier transform. 3) Dividing the input image intopatches and averaging the predictions over each patch. Thisgave best performance when we divided the 2048x2048 imageinto 1024x1024 patches. 4) Using only the central part of theFourier transform magnitude as an input vector. We manuallytested how much of the edges to crop out. 5) Early stopping -when loss on a held out validation set ceased to improve - helpedtest performance.

In general, we observed better performance training on nois-ier and more varied inputs (i.e. cells at different densities, par-ticularly lower densities, and different exposure times). Thisis consistent with other results in deep learning, where addingnoise to training data improves performance [8].

NOTE S4: COMPARISON OF FCFNNS AND CNNS

The FCFNN differs substantially from the convolutional neuralnetworks (CNNs) used as the state-of-the-art in image process-ing tasks. Typically, to solve a many-to-one regression task ofpredicting a scalar from an image, as in the defocus predictionproblem here, CNNs first use a series of convolutional blockswith learnable weights to learn to extract relevant features from

Table S1. Comparison of number of learnable weights andmemory usage

Architecture Image size # of learnableweights

memory pertraining example(MB or KB)

FCFNN(ours)

1024x1024 6.3x106 18 KB

CNN (Renet. al[9])

1000x1000 2.5x108 111 MB

FCFNN(ours)

1000x1000 6.0x106 17 KB

CNN(Yang et.al [10])

84x84 2.9x107 1.3 MB

FCFNN(ours)

84x84 3.6x104 3 KB

the image and then often will pass those features through a se-ries of fully connected layers to generate a scalar prediction [11].Here, we have replaced the feature learning part of the networkwith a deterministic feature extraction module that uses only thephysically-relevant parts of the Fourier Transform.

Deterministically downsampling images into feature vectorsearly in the network reduces the required number of learnableweights and the memory used by the backpropagation algorithmto compute gradients during training by 2-3 orders of magnitude.Table S1 shows a comparison between our FCFNN and twoCNNs used for comparable tasks. The architecture used byRen et al. is used for post-acquisition defocus predicition indigital holograpy and the architecture of Yang et al. is used forpost-acquisition classification of images as in-focus or out-of-focus. Both use the conventional CNN paradigm of a series ofconvolutional blocks followed by one or more fully connectedlayers.

Similar to CNNs, our FCFNN can also incorporate informa-tion from different parts of the full image. CNNs do this witha series of convolutional blocks that gradually expand the sizeof the receptive fields. The FCFNN does this inherently by useof the Fourier transform. Each pixel in the Fourier transformcorresponds to a sinusoid of a certain frequency and orientationin the original image, so its magnitude draws information fromevery pixel.

REFERENCES

1. S. B. Mehta and C. J. R. Sheppard, “Quantitative phase-gradient imaging at high resolution with asymmetricillumination-based differential phase contrast,” Opt. Lett.34, 1924 (2009).

2. A. Edelstein, N. Amodaj, K. Hoover, R. Vale, and N. Stu-urman, “Computer Control of Microscopes Using Micro-Manager,” Curr. Protoc. Mol. Biol. pp. 1–17 (2010).


3. H. Pinkard, N. Stuurman, K. Corbin, R. Vale, andM. F. Krummel, “Micro-Magellan: open-source, sample-adaptive, acquisition software for optical microscopy,” Nat.Methods 13, 807–809 (2016).

4. Z. F. Phillips, R. Eckert, and L. Waller, “Quasi-Dome: ASelf-Calibrated High-NA LED Illuminator for Fourier Pty-chography,” in Imaging and Applied Optics 2017 (3D, AIO,COSI, IS, MATH, pcAOP), , vol. IW4E.5 (2017).

5. Z. F. Phillips, M. V. D’Ambrosio, L. Tian, J. J. Rulison, H. S.Patel, N. Sadras, A. V. Gande, N. a. Switz, D. a. Fletcher, andL. Waller, “Multi-Contrast Imaging and Digital Refocusingon a Mobile Microscope with a Domed LED Array,” PlosOne 10, e0124938 (2015).

6. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen,C. Citro, G. Corrado, A. Davis, J. Dean, M. Devin, S. Ghe-mawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia,L. Kaiser, M. Kudlur, J. Levenberg, D. Man, R. Monga,S. Moore, D. Murray, J. Shlens, B. Steiner, I. Sutskever,P. Tucker, V. Vanhoucke, V. Vasudevan, O. Vinyals, P. War-den, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Sys-tems,” Tech. rep., Google Research (2015).

7. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, andR. Salakhutdinov, “Dropout: A Simple Way to PreventNeural Networks from Overfitting,” J. Mach. Learn. Res.15, 1929–1958 (2014).

8. P. Vincent and H. Larochelle, “Stacked Denoising Autoen-coders: Learning Useful Representations in a Deep Net-work with a Local Denoising Criterion Pierre-Antoine Man-zagol,” J. Mach. Learn. Res. 11, 3371–3408 (2010).

9. Z. Ren, Z. Xu, and E. Y. Lam, “Learning-based nonpara-metric autofocusing for digital holography,” Optica. 5, 337(2018).

10. S. J. Yang, M. Berndl, D. M. Ando, M. Barch,A. Narayanaswamy, E. Christiansen, S. Hoyer, C. Roat,J. Hung, C. T. Rueden, A. Shankar, S. Finkbeiner, and P. Nel-son, “Assessing microscope image focus quality with deeplearning,” BMC Bioinforma. 19, 28 (2018).

11. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNetClassification with Deep Convolutional Neural Networks,”Adv. In Neural Inf. Process. Syst. pp. 1–9 (2011).

deep learningfor single-shotautofocus microscopy: … · 2019-06-04 · 1 led 2 leds 3 leds 4 leds...

Documents