jittered exposures for light field super-resolution

5
JITTERED EXPOSURES FOR LIGHT FIELD SUPER-RESOLUTION Nianyi Li Scott McCloskey ? Jingyi Yu University of Delaware, Newark, DE, USA ? Honeywell ACST, 1985 Douglas Drive North, Golden Valley, MN, USA ABSTRACT Camera design involves tradeoffs between spatial, temporal, and angular resolution. Microlens-based light field cameras, for example, trade a fixed proportion of a sensor’s spatial res- olution for angular resolution in order to enable refocusing, perspective change, and depth estimation. In this paper, we exploit stabilizing optics already included in many lenses to efficiently trade temporal resolution and regain some of the lost spatial resolution. Since temporal resolution loss is a function of both the image capture time and image process- ing, we emphasize our reduced algorithmic complexity com- pared to existing light field superresolution methods while still providing the same level of optical fidelity. We quantify performance with improved depth estimation and improved optical quality, measured on real images captured with a pro- totype 3D light field camera. 1. INTRODUCTION Light field camera design trades off between spatial and an- gular resolution [1]. Take, as an example, the Lytro light field camera, which has a roughly 380 × 380 microlens ar- ray attached to an 11MP sensor. Each microlens covers about 81 pixels, giving 9 × 9 angular resolution, but only deliv- ers images with spatial resolution of 700 × 700. Different microlens arrays can limit resolution loss by covering fewer pixels with each lens, but doing so reduces the angular res- olution. In this paper, we demonstrate that we can recover spatial resolution lost to the microlens array by moving either the lenslets or an image stabilization lens with sub-lenslet pre- cision. Since most microlens arrays are rigidly mounted to the sensor, we use readily available image stabilization (IS) hardware - which exists in most camera and smartphones - to perform physics-based SR for light fields, as shown in Fig. 1. We implement this with a modified IS lens and a DSLR cam- era body whose sensor has an attached microlens array. Understanding that super-resolution (SR) involves a fun- damental tradeoff with temporal resolution, we endeavor to make that tradeoff as efficient as possible, considering both capture and processing times. Existing Light Field (LF) SR methods are computationally very intensive, and therefore are ill-suited to mobile phone cameras where excessive process- ing reduces battery life. Instead, our IS-based method super- resolves optically using hardware which is already present 3D Light Field Camera Lens Stabilizer Lens Stabilizer Cylindrical Microlens Array δ Fig. 1. Our 3D light field camera prototype, which consists of a cylindrical array of lenslets attached to the sensor of a Canon Rebel T2i with our modified stabilizing lens. on most flagship phone cameras. While we demonstrate our LF SR method with a 3D LF camera having cylindrical mi- crolenses, adapting our method to traditional plenoptic cam- era constructed by 2D lenslet arrays is straightforward. Ex- perimentally, we demonstrate better depth estimation from super-resolved views, as shown in Fig. 4, and improved reso- lution in the individual LF views. 2. RELATED WORK For light fields, prior work has introduced both algorithmic and physics-based SR methods. Most of the algorithmic methods [2–4] involve a complex depth estimation proce- dure, which plays a key role in determining SR performance. Other methods employ optical components which attenuate light [5], thereby exacerbating the loss of temporal resolution via the capture channel. The DSLR-resolution light field camera [6], for instance, has significant vignetting from an external aperture used to sequentially capture the views, and Liquid Crystal on Silicon (LCoS) arrangements [7] lose light by using polarizing filters. The hybrid light field camera [4] (using both a Lytro and DSLR camera) captures high spatial and angular resolution at high hardware expense and also involves complex algorithms which reduce temporal resolu- tion. However we demonstrate that consumer-grade optical stabilization hardware provides enough precision to enable physically-correct SR, obviating both the expensive custom hardware and additional algorithmic complexity of LF SR. 4345 978-1-5386-6249-6/19/$31.00 ©2019 IEEE ICIP 2019 Authorized licensed use limited to: Louisiana State University. Downloaded on June 15,2020 at 20:25:06 UTC from IEEE Xplore. Restrictions apply.

Upload: others

Post on 19-Nov-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

JITTERED EXPOSURES FOR LIGHT FIELD SUPER-RESOLUTION

Nianyi Li† Scott McCloskey? Jingyi Yu†

† University of Delaware, Newark, DE, USA? Honeywell ACST, 1985 Douglas Drive North, Golden Valley, MN, USA

ABSTRACTCamera design involves tradeoffs between spatial, temporal,and angular resolution. Microlens-based light field cameras,for example, trade a fixed proportion of a sensor’s spatial res-olution for angular resolution in order to enable refocusing,perspective change, and depth estimation. In this paper, weexploit stabilizing optics already included in many lenses toefficiently trade temporal resolution and regain some of thelost spatial resolution. Since temporal resolution loss is afunction of both the image capture time and image process-ing, we emphasize our reduced algorithmic complexity com-pared to existing light field superresolution methods whilestill providing the same level of optical fidelity. We quantifyperformance with improved depth estimation and improvedoptical quality, measured on real images captured with a pro-totype 3D light field camera.

1. INTRODUCTIONLight field camera design trades off between spatial and an-gular resolution [1]. Take, as an example, the Lytro lightfield camera, which has a roughly 380 × 380 microlens ar-ray attached to an 11MP sensor. Each microlens covers about81 pixels, giving 9 × 9 angular resolution, but only deliv-ers images with spatial resolution of 700 × 700. Differentmicrolens arrays can limit resolution loss by covering fewerpixels with each lens, but doing so reduces the angular res-olution. In this paper, we demonstrate that we can recoverspatial resolution lost to the microlens array by moving eitherthe lenslets or an image stabilization lens with sub-lenslet pre-cision. Since most microlens arrays are rigidly mounted tothe sensor, we use readily available image stabilization (IS)hardware - which exists in most camera and smartphones - toperform physics-based SR for light fields, as shown in Fig. 1.We implement this with a modified IS lens and a DSLR cam-era body whose sensor has an attached microlens array.

Understanding that super-resolution (SR) involves a fun-damental tradeoff with temporal resolution, we endeavor tomake that tradeoff as efficient as possible, considering bothcapture and processing times. Existing Light Field (LF) SRmethods are computationally very intensive, and therefore areill-suited to mobile phone cameras where excessive process-ing reduces battery life. Instead, our IS-based method super-resolves optically using hardware which is already present

3D Light Field Camera

Lens StabilizerLens Stabilizer

Cylindrical Microlens Array

δ

Fig. 1. Our 3D light field camera prototype, which consistsof a cylindrical array of lenslets attached to the sensor of aCanon Rebel T2i with our modified stabilizing lens.

on most flagship phone cameras. While we demonstrate ourLF SR method with a 3D LF camera having cylindrical mi-crolenses, adapting our method to traditional plenoptic cam-era constructed by 2D lenslet arrays is straightforward. Ex-perimentally, we demonstrate better depth estimation fromsuper-resolved views, as shown in Fig. 4, and improved reso-lution in the individual LF views.

2. RELATED WORK

For light fields, prior work has introduced both algorithmicand physics-based SR methods. Most of the algorithmicmethods [2–4] involve a complex depth estimation proce-dure, which plays a key role in determining SR performance.Other methods employ optical components which attenuatelight [5], thereby exacerbating the loss of temporal resolutionvia the capture channel. The DSLR-resolution light fieldcamera [6], for instance, has significant vignetting from anexternal aperture used to sequentially capture the views, andLiquid Crystal on Silicon (LCoS) arrangements [7] lose lightby using polarizing filters. The hybrid light field camera [4](using both a Lytro and DSLR camera) captures high spatialand angular resolution at high hardware expense and alsoinvolves complex algorithms which reduce temporal resolu-tion. However we demonstrate that consumer-grade opticalstabilization hardware provides enough precision to enablephysically-correct SR, obviating both the expensive customhardware and additional algorithmic complexity of LF SR.

4345978-1-5386-6249-6/19/$31.00 ©2019 IEEE ICIP 2019

Authorized licensed use limited to: Louisiana State University. Downloaded on June 15,2020 at 20:25:06 UTC from IEEE Xplore. Restrictions apply.

Main Lens

lenslet array u

s

(a) (b)

δ

δ δ𝐾𝐾

(c)

Sensorδ𝐾𝐾

Fig. 2. Sub-lenslet light field SR. (a) 2D schematic of a plenoptic camera (lenses sizes are exaggerated for illustration). Bymoving the microlens array, the sub-aperture views on the main lens capture more rays on the light field. (b) Ray space diagram.The LF camera integrates rays at each sensor element (blue and red 2D squares). The captured sub-lenslet alisased light field(red grids) integrates different information than the original one (blue grids).(c) The LF SR result from multiple LR light fieldwith sub-lenslet shifts. SR results of the central view (green outline) vs. the original LR views (blue and red frame).

3. JITTERED LENS LIGHT FIELD SR

We first introduce our lens control arrangement, and the 3Dlight field camera that we use to demonstrate light field superresolution. Finally, we demonstrate how the combined systemcan be used to capture shifted light fields which, when inter-leaved, preserve a higher spatial resolution without the needfor a complex, battery-draining algorithm.

3.1. Lens Control

Fig. 1 shows the prototype of our camera where, once theshutter button has been pressed, the hot shoe signal triggersan external stabilizer controller to modify the IS lens’s po-sition to a location of our choosing. Note that if we couldaccess the API of the stabilizer controller within the camera,we wouldn’t need to modify the camera body at all. In thatsense, it is not necessary to add any additional hardware toachieve LF SR beyond the traditional trade-off.

We mounted the modified Canon EF 100mm/2.8L MacroIS lens. The Canon OIS module includes a light-weight lensencased in a housing with a metallic yoke, motion of whichis achieved by modulating a voice coil. When the shutter re-lease is depressed halfway, a mechanical lock is released andthe lens is moved by a processor via pulse width modulation.As described in [8], the standard mode of that processor isa closed control loop where motion detected by gyroscopesmounted within the lens induces a compensating motion ofthe stabilizing element. In order to drive the lens to the de-sired positions while capturing our LR images, we break thecontrol loop and decouple the stabilizing element from themotion sensor. We utilize an independent microprocessor tocontrol the movement of VR lens through pulse width mod-ulation supervised by a PID control loop. Once the shutterbutton is pressed, the hot shoe signal initiates our controllerto hold at a programmed position. Note that we have previ-ously used this same hardware for 2D image SR in [9].

Conceptually, the capture of images needed for our ap-proach is similar to exposure bracketing, which is already apre-programmed behavior of DSLR cameras. Whereas expo-sure bracketing captures a sequence of images between whichthe exposure time and/or aperture are modified, our ‘positionbracketing’ system captures a sequence of images betweenwhich the location of the OIS lens changes.

3.2. 3D Light Field Camera Prototype

To demonstrate IS-assisted SR, we built a 3D light field cam-era shown in Fig. 1. Rather than choose a 2D microlens array,we place a cylindrical lenslet array between the main lensand camera sensor. This design, while still providing depthinformation, only suffers resolution loss in one direction. Ourcylindrical axis for the lenslet array is along horizontal di-rection, so we keep the vertical resolution and only trade thehorizontal spatial for angular resolution. Specifically, we at-tach a cylindrical array, featuring 40 cylindrical lenses, ontothe sensor surface of a Canon Rebel T2i DSLR camera. Thepitch of each lens is 0.25mm, and the focal length is 1.6mm.The size of our cylindrical lenslet array is 10mm × 10mm,and is placed roughly in the center of the 25mm × 17mmsensor. When capturing the LF, we set the f-number to 5.6,and use 3456 × 2304 resolution, though only the region ofthe senor under the (smaller) lenslet array is useful.

We represent the light field by the two plane parameteri-zation [10, 11]. We represent the camera’s data by two planeΠst and Πuv , where (s, t) denotes a point on the main lensplane and (u, v) on the microlens plane. Assuming that themicrolenses and pixels on the sensor are aligned, all the lightmeasured at a pixel satisfy: 1) passing through its parent mi-crolens, and 2) passing through the pixel’s conjugate regionon the main lens. Therefore, these two regions define a small2D box in the 3D light field. The pixel value is the integralof this box and each microlenslet separates the rays accordingto their direction, thereby recording many samples of the full

4346

Authorized licensed use limited to: Louisiana State University. Downloaded on June 15,2020 at 20:25:06 UTC from IEEE Xplore. Restrictions apply.

(a) (b)

Main Lens lensletarray

δ

Sensor

δ𝐾𝐾

V V’

Z’

Z

d

Main Lens lensletarray

d’

Sensor

2δ/3δ/30

(c)

d’’

Fig. 3. Our 3D light field sampling pattern. (a) Cylindricallens refracts light in one direction. (b)(c) By moving the mainlens, we fuse several LR light fields to generate an SR LF.

light field impinging the main lens, as shown in Fig. 2 (b).Recall that we only have a light field in the 2D slice (u, s).We use I(s) to represent the sub-aperture image at location son the main lens. Suppose that the lenslet size is δ, and eachmicrolens covers K pixels in the horizontal direction, captur-ing K different views of the scene. Therefore, the resolutionof I(s) is reduced by a factor of K. I(s) capture rays (s, un)where n = 1, ..., N and N is the number of lenslets on themicrolens array. If we deliberately translate the microlens ar-ray by δ/K towards the positive direction of u-axis, the raycaptured by the nth lenslet is (s, un+δ/K), whereK denotesthe expected magnification factor and K 6= K.

As shown in Fig. 3 (a), if the scene depth is z, and mainlens focal length is F , the focal plane within the camera isfound at z′ = zF/(z − F ). Assuming that the lenslet arrayis placed at v to the main lens and v′ to the sensor plane, thesampling points of a sub-aperture view I(s) on the focusedscene at z′ should satisfy

pun =z′unv

=zFun

v(z − F ), (1)

and the sampling rate is

fLR =1

pun+1− pun

=v(z − F )

zFδ, (2)

where δ is the lenslet size. By moving the microlens arraywith sub-lenslet precision, we could capture K more points,i.e. pk

unand k = 1, ...K, between pun and pun+1

. This

distance between pkun

and pk=1un+1

is

d = pk+1un− pk

un=

zFδ

v(z − F )K. (3)

By combing the captured K light fields, the sampling rate forthe integrated super-resolved light field is

fSR =1

d=v(z − F )K

zFδ= KfLR. (4)

Therefore, the SR light field is of K times spatial resolutionthan each captured LR light fields.

3.3. Light Field Super-Resolution

We have demonstrated that sub-lenslet movement of the mi-crolens array enables super-resolving the LR sub-aperture im-ages. Next, we show how to modify the main lens position(s, t) to achieve sub-lenslet shifting on the microlens plane(u, v). If we fix the sensor and lenslet array position and onlymove the main lens, we need to specify the relationship be-tween main lens and microlens array movement. Suppose thatthe origin O is the optical center of the main lens, and wetranslate the lens by ∆ so that the focused scene within thecamera will shift d′ and each microlens captures the same setof rays as moving the lenslet array by δ/K, as shown in Fig. 3(b). The correspondence between ∆, d′ and δ should satisfy

d′ =(z + z′)∆

z= d+ d′′, (5)

where

d =z′δ

vKand d′′ =

(v − z′)∆z′

. (6)

Combining Eqn. 5 and Eqn. 6, we can get that

∆ =δz(z′)

2

Kv((z′)2

+ 2zz′ − vz),

=δF 2z2

Kv((2F − v)z2 + F (2v − F )z − vF 2).

(7)

In the general cases, z � F and z � v. We can thereforeapproximate ∆ by

∆ =δF 2

Kv(2F − v). (8)

By precisely moving the main lens through the stabilizercontroller, we are able to achieve SR light field preservingthe full spatial resolution of DSLR. Moreover, by carefullychoosing the motion step, our light field prototype can breakthe limitation arising from the sensor’s pixel count. Fig. 3 (c)shows a real scene example of our captured light field. MoreSR results are presented in Fig. 2 (c), 4, and 5.

4347

Authorized licensed use limited to: Louisiana State University. Downloaded on June 15,2020 at 20:25:06 UTC from IEEE Xplore. Restrictions apply.

s=2 s=32

s=2 s=32

0 δ/3 2δ/3

i

i

i

i

s=2 s=32

s=2 s=32

0 δ/3 2δ/3

Fig. 4. Two SR Light fields results. First row: three captured low-resolution (LR) LFs. Second row: the resized LR sub-apertureimages and disparity/depth map from the LR light field. The third row: our SR results by combining three captured LR lightfield, from which we get better depth.

4. EXPERIMENTS

For 3D light field SR, we first discuss camera calibration, andthen show that our method improves both depth estimationperformance and quality of the sub-aperture images.

Camera Calibration After it’s placed on the sensor, theexact placement of the cylindrical lens array is unknown, andthe baseline between cylindrical lens is a non-integer mul-tiple of the pixel pitch. We capture an image of a whitetarget where, because of vignetting, the brightest line alongeach lenslet image is the center of cylindrical lens. We fit themean value of each column in the white image to a sine curvea+bsin(T ∗x+c). Consequently, the baseline between eachmicrolens is δ = 2π/T and the center of cylindrical lens cor-responds to lines locating at x = (π/2 + 2kπ − c)/T alonghorizontal axis, where k = 0, 1, 2, ..., n − 1. We extract thesame columns under each microlens to form sub-aperture im-ages and re-arrange each captured light field into sub-apertureimages, as in Fig. 2 (c).

Depth Estimation A crucial problem of our 3D light fieldsub-aperture image is that, even though it preserves the ver-tical resolution of DSLR, depth recovery is challenging if δis large. The sub-aperture images have coarse sampling inthe horizontal direction, making it difficult to establish corre-spondence and leading to poor depth estimates without SR,as shown in Fig. 4. We apply our method to 3 captured lightfield images, and achieve 3× SR in the resolution-limited di-rection. Notice that there are only 40 cylindrical lenses on themicrolens array, which limited the resolution on horizontal di-rection for each sub-aperture images to 40 pixels (the verticalresolution remains high).

Sub-Aperature Image Quality Figure 5 shows the vi-sual improvement in sub-aperture images captured with ourmethod. Regarding depth information, we adopt a multi-viewdisparity estimation procedure [12] to compute depth maps.

Fig. 5. In addition depth estimation, our method improvessub-aperture images. The left column shows sub-aperture im-ages from a single capture, with very low spatial resolutionin the horizontal direction. The right column shows the samesub-aperture images after our method. The middle columnblends the LR (yellow) and SR (cyan), to show improvement.

As shown in Fig. 4, each captured light field has low reso-lution along horizontal axis and it is hard to tell the parallaxbetween two LR sub-aperture images. By combining threecaptured LF, we increase resolution in this direction and re-cover better depth information.

5. CONCLUSION

We’ve presented a computational imaging solution for multi-image LF SR, at a much lower computational cost by leverag-ing the positional accuracy of modern IS lenses. This greatlyreduces the algorithmic complexity, while retaining high per-formance. We show that simply capturing and interleavinglight field samples taken under different IS positions recoversa significant proportion of the resolution lost to the microlensarray. Though we demonstrate this on a light field camerawith cylindrical microlenses, the extension to cameras with2D microlens arrays is straightforward.

4348

Authorized licensed use limited to: Louisiana State University. Downloaded on June 15,2020 at 20:25:06 UTC from IEEE Xplore. Restrictions apply.

6. REFERENCES

[1] Todor Georgeiv, Ke Colin Zheng, Brian Curless, DavidSalesin, Shree Nayar, and Chintan Intwala, “Spatio-angular resolution tradeoff in integral photography,” inEurographics Symposium on Rendering, 2006.

[2] Tom E Bishop, Sara Zanetti, and Paolo Favaro, “Lightfield superresolution,” in Computational Photography(ICCP), 2009 IEEE International Conference on. IEEE,2009, pp. 1–9.

[3] Tom E Bishop and Paolo Favaro, “The light field cam-era: Extended depth of field, aliasing, and superresolu-tion,” IEEE Transactions on Pattern Analysis and Ma-chine Intelligence, vol. 34, no. 5, pp. 972–986, 2012.

[4] Vivek Boominathan, Kaushik Mitra, and Ashok Veer-araghavan, “Improving resolution and depth-of-field oflight field cameras using a hybrid imaging system,” inInt’l Conf. on Computational Photography, 2014.

[5] Ashok Veeraraghavan, Ramesh Raskar, Amit Agrawal,Ankit Mohan, and Jack Tumblin, “Dappled photog-raphy: Mask enhanced cameras for heterodyned lightfields and coded aperture refocusing,” ACM Trans.Graph., vol. 26, no. 3, pp. 69, 2007.

[6] Dikpal Reddy, Jiamin Bai, and Ravi Ramamoorthi, “Ex-ternal mask based depth and light field camera,” ICCV’13 Workshop Consumer Depth Cameras for Vision,2013.

[7] H. Nagahara, S. Kuthirummal, C. Zhou, and S.K. Na-yar, “Flexible Depth of Field Photography,” in Euro-pean Conf. on Computer Vision, 2008.

[8] Scott McCloskey, Kelly Muldoon, and Sharath Venkate-sha, “Motion aware motion invariance,” in Compu-tational Photography (ICCP), 2014 IEEE InternationalConference on. IEEE, 2014, pp. 1–9.

[9] Nianyi Li, Scott McCloskey, and Jingyi Yu, “Jitteredexposures for image super-resolution,” in The IEEEConference on Computer Vision and Pattern Recogni-tion (CVPR) Workshops, June 2018.

[10] Marc Levoy and Pat Hanrahan, “Light field rendering,”in Proceedings of the 23rd annual conference on Com-puter graphics and interactive techniques. ACM, 1996,pp. 31–42.

[11] Steven J Gortler, Radek Grzeszczuk, Richard Szeliski,and Michael F Cohen, “The lumigraph,” in Proceed-ings of the 23rd annual conference on Computer graph-ics and interactive techniques. ACM, 1996, pp. 43–54.

[12] Vaibhav Vaish, Marc Levoy, Richard Szeliski,C Lawrence Zitnick, and Sing Bing Kang, “Recon-structing occluded surfaces using synthetic apertures:Stereo, focus and robust measures,” in 2006 IEEEComputer Society Conference on Computer Vision andPattern Recognition (CVPR’06). IEEE, 2006, vol. 2, pp.2331–2338.

4349

Authorized licensed use limited to: Louisiana State University. Downloaded on June 15,2020 at 20:25:06 UTC from IEEE Xplore. Restrictions apply.