display-aware image editing - people | mit...

8
Display-aware Image Editing Won-Ki Jeong Harvard University Micah K. Johnson Massachusetts Institute of Technology Insu Yu Jan Kautz University College London Hanspeter Pfister Harvard University Sylvain Paris Adobe Systems, Inc. Abstract We describe a set of image editing and viewing tools that explicitly take into account the resolution of the display on which the image is viewed. Our approach is twofold. First, we design editing tools that process only the visible data, which is useful for images larger than the display. This en- compasses cases such as multi-image panoramas and high- resolution medical data. Second, we propose an adaptive way to set viewing parameters such brightness and contrast. Because we deal with very large images, different locations and scales often require different viewing parameters. We let users set these parameters at a few places and interpo- late satisfying values everywhere else. We demonstrate the efficiency of our approach on different display and image sizes. Since the computational complexity to render a view depends on the display resolution and not the actual input image resolution, we achieve interactive image editing even on a 16 gigapixel image. 1. Introduction Gigapixel images are now commonplace with dedicated de- vices to automate the image capture [1, 2, 24, 37] and im- age stitching software [5, 11]. These large pictures have a unique appeal compared to normal-sized images. Fully zoomed out, they convey a global sense of the scene, while zooming in lets one dive in, revealing the smallest details, as if one were there. In addition, modern scientific instru- ments such as electron microscopes or sky-surveying tele- scopes are able to generate very high-resolution images for scientific discovery at the nano- as well as at the cosmo- logical scale. We are interested in two problems related to these large images: editing them and viewing them. Edit- ing such large pictures remains a painstaking task. Al- though after-exposure retouching plays a major role in the rendition of a photo [3], and enhancing scientific images is critical to their interpretation [8], these operations are still mostly out of reach for images above 100 megapixels. Stan- dard editing techniques are designed to process images that have at most a few megapixels. While significant speed ups have been obtained at these intermediate resolutions, e.g. [14, 16, 17, 28], major hurdles remain to interactively edit larger images. For instance, optimization tools such as least-squares systems and graph cuts become unpractical when the number of unknowns approaches or exceeds a bil- lion. Furthermore, even simple editing operations become costly when repeated for hundreds of millions of pixels. The basic insight of our approach is that the image is viewed on a display with a limited resolution, and only a subset of the image is visible at any given time. We describe a series of image editing operators that produce results only for the visible portion of the image and at the displayed resolution. A simple and efficient multi-resolution data representation (§ 2) allows each image operator to quickly access the cur- rently visible pixels. Because the displayed view is com- puted on demand, our operators are based on efficient im- age pyramid manipulations and designed to be highly data parallel (§ 3). When the user changes the view location or resolution we simply recompute the result on the fly. Further, editing tools do not support the fact that very large images can be seen at multiple scales. For instance, a high- resolution scan as shown in the companion video reveals both the overall structure of the brain as well as the fine entanglement between neurons. In existing software, set- tings such as brightness and contrast are the same, whether one looks at the whole image or at a small region. In com- parison, we let the user specify several viewing settings for different locations and scales. This is useful for emphasiz- ing different structures, e.g. on a brain scan, or expressing different artistic intents, e.g. on a photo. We describe an in- terpolation scheme motivated by a user study to infer the viewing parameters where the user has not specified any settings (§ 4). This adaptive approach enables the user to obtain a pleasing rendition at all zoom levels and locations while setting viewing parameters only at a few places. The novel contributions of this work are twofold. First, we describe editing operators such as image stitching and seamless cloning that are output-sensitive, i.e., the associ- ated computational effort depends only on the display res- olution. Our algorithms are based on Laplacian pyramids, 1

Upload: others

Post on 28-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Display-aware Image Editing - People | MIT CSAILpeople.csail.mit.edu/sparis/publi/2011/iccp_display/... · 2011-03-03 · these large images: editing them and viewing them. Edit-ing

Display-aware Image Editing

Won-Ki JeongHarvard University

Micah K. JohnsonMassachusetts Institute of Technology

Insu Yu Jan KautzUniversity College London

Hanspeter PfisterHarvard University

Sylvain ParisAdobe Systems, Inc.

AbstractWe describe a set of image editing and viewing tools thatexplicitly take into account the resolution of the display onwhich the image is viewed. Our approach is twofold. First,we design editing tools that process only the visible data,which is useful for images larger than the display. This en-compasses cases such as multi-image panoramas and high-resolution medical data. Second, we propose an adaptiveway to set viewing parameters such brightness and contrast.Because we deal with very large images, different locationsand scales often require different viewing parameters. Welet users set these parameters at a few places and interpo-late satisfying values everywhere else. We demonstrate theefficiency of our approach on different display and imagesizes. Since the computational complexity to render a viewdepends on the display resolution and not the actual inputimage resolution, we achieve interactive image editing evenon a 16 gigapixel image.

1. Introduction

Gigapixel images are now commonplace with dedicated de-vices to automate the image capture [1, 2, 24, 37] and im-age stitching software [5, 11]. These large pictures havea unique appeal compared to normal-sized images. Fullyzoomed out, they convey a global sense of the scene, whilezooming in lets one dive in, revealing the smallest details,as if one were there. In addition, modern scientific instru-ments such as electron microscopes or sky-surveying tele-scopes are able to generate very high-resolution images forscientific discovery at the nano- as well as at the cosmo-logical scale. We are interested in two problems related tothese large images: editing them and viewing them. Edit-ing such large pictures remains a painstaking task. Al-though after-exposure retouching plays a major role in therendition of a photo [3], and enhancing scientific images iscritical to their interpretation [8], these operations are stillmostly out of reach for images above 100 megapixels. Stan-dard editing techniques are designed to process images thathave at most a few megapixels. While significant speedups have been obtained at these intermediate resolutions,

e.g. [14, 16, 17, 28], major hurdles remain to interactivelyedit larger images. For instance, optimization tools suchas least-squares systems and graph cuts become unpracticalwhen the number of unknowns approaches or exceeds a bil-lion. Furthermore, even simple editing operations becomecostly when repeated for hundreds of millions of pixels. Thebasic insight of our approach is that the image is viewed ona display with a limited resolution, and only a subset of theimage is visible at any given time. We describe a seriesof image editing operators that produce results only for thevisible portion of the image and at the displayed resolution.A simple and efficient multi-resolution data representation(§ 2) allows each image operator to quickly access the cur-rently visible pixels. Because the displayed view is com-puted on demand, our operators are based on efficient im-age pyramid manipulations and designed to be highly dataparallel (§ 3). When the user changes the view location orresolution we simply recompute the result on the fly.

Further, editing tools do not support the fact that very largeimages can be seen at multiple scales. For instance, a high-resolution scan as shown in the companion video revealsboth the overall structure of the brain as well as the fineentanglement between neurons. In existing software, set-tings such as brightness and contrast are the same, whetherone looks at the whole image or at a small region. In com-parison, we let the user specify several viewing settings fordifferent locations and scales. This is useful for emphasiz-ing different structures, e.g. on a brain scan, or expressingdifferent artistic intents, e.g. on a photo. We describe an in-terpolation scheme motivated by a user study to infer theviewing parameters where the user has not specified anysettings (§ 4). This adaptive approach enables the user toobtain a pleasing rendition at all zoom levels and locationswhile setting viewing parameters only at a few places.

The novel contributions of this work are twofold. First,we describe editing operators such as image stitching andseamless cloning that are output-sensitive, i.e., the associ-ated computational effort depends only on the display res-olution. Our algorithms are based on Laplacian pyramids,

1

Page 2: Display-aware Image Editing - People | MIT CSAILpeople.csail.mit.edu/sparis/publi/2011/iccp_display/... · 2011-03-03 · these large images: editing them and viewing them. Edit-ing

which we motivate by a theoretical study of the constraintsrequired to be display-aware. Second, we propose an inter-polation scheme motivated by a user study to infer viewingparameters where the user has not specified any settings.We illustrate our approach with a prototype implementedon the GPU and show that we can interactively edit verylarge images as large as 16 gigapixels.

1.1. Related Work

The traditional strategy with large images is to process themat full resolution and then rescale and crop according to thecurrent display. As far as we know, this is commonly usedin commercial software. However, this simple approachbecomes quickly unpractical with large images, especiallywith optimization such as graph cuts and Poisson solvers.

Fast image filters have been proposed to speed up oper-ations such as edge-aware smoothing [4, 14, 15, 17, 32],seamless compositing [5, 16, 22], inpainting [9], and selec-tion [28]. Although these algorithms reduce the computa-tion times, they have been designed for standard-size im-ages and the entire picture at full resolution is eventuallyprocessed. In comparison, we propose display-aware algo-rithms that work locally in space and scale such that onlythe visible portion of the data is processed.

Berman et al. [10] and Velho and Perlin [36] describe multi-scale painting systems for large images based on wavelets.From an application perspective, our work is complemen-tary as we do not investigate methods for painting but ratherfor adaptive viewing and more advanced editing such asseamless cloning. Technically speaking, our methods op-erate in a display-aware fashion, and not in a multi-scalefashion. That is, we apply our edits on-the-fly to the currentview and never actually propagate the results to all scales.Further, it is unclear how to extend the proposed approachfrom painting to algorithms such as seamless cloning.

Pinheiro and Velho [34] and Kopf et al. [24] proposea multi-resolution tiled memory management system forviewing large data. Our data management follows similardesign principles, but supports multiple input images thatcan be aligned to form a local image pyramid on-the-flywithout managing a pre-built global multiresolution imagepyramid. It also naturally supports out-of-core computa-tions on graphics hardware with limited memory. Jeong etal. [21] proposed an interactive gigapixel viewer suitable formulti-layer microscopy image stacks, which shares a simi-lar idea to our data management scheme. In comparison, wefocus on single-layer images and also tackle editing issues.

Kopf et al. [24] applies a histogram-based tone-mapper toautomatically adjust the current view of large HDR images.Our work can also be automatic but also let the user to over-ride the default settings as many times as desired. This al-

lows users to make adjustments that adapt to the currentview and may reflect subjective intents. Furthermore, wepropose more complex output-sensitive algorithms for taskssuch as seamless cloning.

Shantzis [35] describes a method to limit the amount ofcomputation by only processing the data within the bound-ing box of each operator. We extend this approach is severalways. Unlike Shantzis, we deal with changes of zoom leveland ignore the high-frequency data when they are not visi-ble. This property is nontrivial as we shall see (§ 3.1). Wealso design new algorithms that enable display-aware pro-cessing such as our stitching method based on local compu-tation only. In comparison, the standard method based ongraph cut is global, i.e. the bounding box would cover theentire image. Further, we also deal with viewing parame-ters, which is not in the scope of Shantzis’ work.

2. Data Representation

A major aspect of our approach is that the view presented tothe user is always computed on the fly. From a data struc-ture point of view, this implies that the displayed pixel datahave to be readily available and that we can rapidly deter-mine how to process them. To achieve this, we use severalmechanisms detailed in the rest of this section.

Global Space and Image Tiles Our approach is orga-nized around a coordinate system in which points are lo-cated by their (x, y) spatial location and the scale s at whichthey are observed. A unit scale s = 1 corresponds to thefull-resolution data, while s = 1

n corresponds to the imagedownsampled by a factor of n. We use this coordinate sys-tem to define global space in which we locate data with re-spect to the displayed image that the user observes (Fig. 1).Typically, we have several input images that make up, e.g.,a panorama. For each image Ii, we first compute a geo-metric transformation gi that aligns it with the others byspecifying its location in global space. If we have only oneinput image, then g is the identity function. The geomet-ric alignment can either be pre-computed before the editingsession, or each image can be aligned on the fly when it isdisplayed. In the former case, we use feature point detec-tion and homography alignment, e.g. [11]. In the latter case,the user interactively aligns the images in an approximatemanner. We then automatically register them in a display-aware fashion by maximizing the cross-correlation betweenvisible overlapping areas. This is useful for images that arebeing produced on-line by automated scientific instruments.We decompose all input images into tiles. For each tile, wepre-compute a Gaussian pyramid to enable access to anyportion of the input images at arbitrary resolutions. For res-olutions that we have not pre-computed we fetch the pyra-mid level with a resolution just higher than the requestedone and downsample it on the fly. The resampling step is

Page 3: Display-aware Image Editing - People | MIT CSAILpeople.csail.mit.edu/sparis/publi/2011/iccp_display/... · 2011-03-03 · these large images: editing them and viewing them. Edit-ing

essentially free on graphics hardware, and although we loadmore data than needed, the overhead is small compared toloading the full-resolution tile or the entire input image. Wefurther discuss the computational complexity of this opera-tion in Section 5.

Operator Representation We distinguish two types ofoperators. Local operators, such as copy-and-paste or im-age cloning, affect only a subset of the image. We storetheir bounding box in global space as well as an index thatindicates in which order the user has performed the edits.We did not include scale s in this representation becausewe could not conceive of any realistic scenarios in whicha local operator would apply only at certain scales, but in-cluding it would be straightforward if needed. When theuser moves the display to a new position, the viewport de-fines a display rectangle at a given scale in global space.We test each operator and keep only the ones whose bound-ing box intersect with the viewport. In our current imple-mentation operators are stored in a list and we test themall since bounding box intersections are efficient. Once wehave identified the relevant operators, we apply them in or-der to the visible pixels at the current resolution. The globaloperators brightness, contrast, and saturation, affect all thepixels. We apply these transformations after the local op-erators and always in the same order: brightness, contrast,saturation. If the user modifies a setting twice, we keep onlythe last one. We found that it is beneficial to let users spec-ify different values at different positions and scales. In thiscase, we store one setting at each (x, y, s) location wherethe user makes an adjustment and interpolate these valuesto other locations (§ 4).

3. Local OperatorsIn this section, we describe local editing operators. The al-gorithms are designed to be display-aware, that is, we pro-cess only the visible portion of the image at the current res-olution and perform only a fixed amount of computationper pixel. We first study these operators from a theoreticalstandpoint and then illustrate our strategy on two specific

tasks: seamless cloning and panorama stitching.

3.1. Theoretical Study

We study the requirements that an operator f must satisfy tobe display-aware. The function f takes an image I as inputand creates an image O as output, that is, O = f(I). To bedisplay-aware, f must be able to compute the visible por-tion of the output using only the corresponding input data.First, we characterize how the visible portion of an imagerelates to the full-resolution data. We consider an image X .To be displayed,X is resampled at the screen resolution andcropped. We only consider the case where the screen reso-lution is lower than the image resolution. The opposite caseis only about interpolating pixel values and does not need aspecial treatment. Downsampling the imageX is done witha low-pass filter ` followed by a comb filter. Assuming aperfect low-pass filter, ` is a multiplication by a box filterin the Fourier domain. After this, the comb filter does notremove any information and we can ignore it. The other ef-fect of displaying the image on a screen is that only part ofit is visible. This is a cropping operation c that is a multi-plication by a box function in the image domain. We definethe operator s(X) = c(`(X)) that displays X on a screen.

To be display-aware, f must satisfy s(f(I)) = f(s(I)),that is, we must be able to compute the visible portion ofthe output s(f(I)) using only the visible portion of the in-put s(I). A sufficient condition is that f commutes with `and c. ` can be any arbitrary box centered in the Fourierdomain. To commute with it, f must be such that the con-tent of f(X) at a frequency (u0, v0) depends only on thecontent of X at frequencies |u| ≤ |u0| and |v| ≤ |v0|. Therationale is that these frequencies are preserved by ` evenif its cut-off is (u0, v0). The crop function c applies an ar-bitrary box in image space. For f to commute with it, itmust be a point-wise operator since there is no guaranteethat adjacent pixels are available. However, these two con-ditions are too strict to allow for any useful filter. We relaxthe latter one by considering an “extended screen”. For in-stance, for an operator based on 5 × 5 windows, we add a

y

x

scale

low res.

aligned imagedomains

local operator 2(out of view)

local operator 1(active) viewport

tilepyramidtiles

input images

editing operators

cached data global space

local 1: cut-and-pastelocal 2: texture enhancement... global 1: contrastglobal 2: brightness...

global operator 1

global operator 2

high res.

Figure 1. Our approach is based on a caching scheme that ensures that the pixel data are readily available to the editing algorithms. Wedecompose each input image into tiles and compute a multi-resolution pyramid for each tile. We register the tiles into a common coordinatesystem, the global space. We determine the visible tiles by intersecting them with the viewport rectangle. To the display pixels we eitherapply local operators with a bounding box that intersects the viewport or interpolated global operators such as brightness and contrast.

Page 4: Display-aware Image Editing - People | MIT CSAILpeople.csail.mit.edu/sparis/publi/2011/iccp_display/... · 2011-03-03 · these large images: editing them and viewing them. Edit-ing

2-pixel margin. We apply a similar relaxation in the Fourierdomain by adding a “frequency margin”, i.e., the input im-age is resampled at a slightly higher resolution, typically theclosest power-of-two resolution. In both cases, the numberof processed pixels remain on the same order as the displayresolution. A strategy to satisfy these requirements is to de-compose the image I into a Laplacian pyramid and processeach level independently and locally. If a process gener-ates out-of-band content, we could post-process the levelsto remove it but we did not find it useful in the examplesshown in this paper. This approach yields data-parallel al-gorithms since constructing a pyramid involves purely localoperations and so do our display-aware filters.

3.2. On-the-fly Image Alignment and Stitching

Existing large-scale image viewers require a globallyaligned and stitched full-resolution panorama to build amultiresolution image pyramid [24, 34]. Poisson composit-ing is commonly used to stitch multiple images into apanorama [5, 25], but for very large images even optimizedmethods become costly. Further, recent automated imagescanners [20] can produce large images at a speed of upto 11 GB/s. In such a scenario, it is useful to get a quickoverview of the entire image with coarse alignment, and torefine the alignment on-the-fly as the user zooms in. Ouron-the-fly alignment assumes that the input images are ap-proximately in the right position in global space. This is thecase for automated panorama acquisition systems and sci-entific instruments. Otherwise, the user can manually alignthem or run an feature detection algorithm such as [11]. Wefirst adjust the images to have the same exposure and whitebalance. The affine transformation between images is thenautomatically refined by maximizing cross-correlation be-tween overlapping regions. We implemented this using gra-dient descent on the GPU. The alignment is computed forthe current zoom level and automatically refined when theuser zooms further (see the video, note that in video, re-finement is not automatic so that its effect is visible). Westitch the images using the pyramid-based scheme of Burtand Adelson [13]. At each pixel with an overlap, we selectthe image which border is the farthest, yielding a binarymask for each input Ii. We compute Gaussian pyramids Gifrom these masks and Laplacian pyramids Li from the inputimages Ii. We linearly blend each level n independently toform a new Laplacian pyramid Ln =

∑iG

ni L

ni /

∑iG

ni .

Finally, we collapse the pyramid L to obtain the result.

3.3. Push-Pull Image Cloning

Seamless copy-pasting is a standard tool in editing pack-ages [19, 33]. Most implementations rely on solving thePoisson equation and even if optimized algorithms ex-ist [5, 22, 30], this strategy requires to access every pixelat the finest resolution, which does not suit our objec-

(a) Output of Farbman et al. (b) Our resultFigure 2. Although our result and the output of Farbman et al. [16]are not the same, both are satisfying. The input images and Farb-man’s result come from [16].

tives. Farbman et al. [16] exploit that seamless cloningboils down to smoothly interpolating the color differencesat the foreground-background boundary and propose anoptimization-free method based on a triangulation of thepasted region. Although it might be possible to adapt Farb-man’s triangulation to our needs, we propose a pyramid-based method that naturally fits our display-aware contextthanks to its multi-scale formulation, and that does not in-cur the triangulation cost.

We perform a push-pull operation [12] on the color dif-ferences at the boundary. We consider a background im-age B and a foreground image F with a binary mask M .We compute the color offset O = B − F for each pixelon the boundary of M . During the pull phase, we builda Gaussian pyramid from the O values. Since O is onlydefined at the mask boundary, we ignore all the undefinedvalues during this computation and obtain a sparse pyra-mid where only some pixels have defined values (and mostare empty). Then we collapse the pyramid starting fromthe coarsest level. In particular, we push pixels with a de-fined value down to pixels at finer levels that are empty.To avoid blockiness, we employ a bicubic filter during thepush phase. This process smoothly fills in the hole [12]and generates an offset map that we add to the foregroundpixels before copying them on top of the background. Weapply this process in a display-aware fashion by consider-ing only the visible portion of the boundary. When the usermoves the view, appearing and disappearing boundary con-straints can induce flickering. Since the offset membrane Ois smooth, flickering is only visible near the mask boundary.Thus, we run our process on a slightly extended viewport sothat flickering occurs outside the visible region. In practice,we found that extending it by 20 pixels in each direction isenough. Zooming in and out can also cause flickering be-cause the alignment between the boundary and the pyramidpixel grid varies. We address this issue by scaling the datato the next power-of-two, which ensures that the alignmentremains consistent.

4. Global Operator InterpolationFor global operators, we have implemented the traditionalbrightness, contrast, and saturation adjustments. These op-

Page 5: Display-aware Image Editing - People | MIT CSAILpeople.csail.mit.edu/sparis/publi/2011/iccp_display/... · 2011-03-03 · these large images: editing them and viewing them. Edit-ing

erators raise specific issues in the context of large images.Figure 3 shows the difference between an image that is fullyzoomed out and fully zoomed in on a shadow region. Thesame viewing settings cannot be applied to both images.Our solution is adapt the parameters to the location andzoom level. Our approach is inspired by the automatic tone-mapping described by Kopf et al. [24]. Similarly to thistechnique, our approach can be fully automatic but we alsoextend it let the user control the settings and offer the pos-sibility to specify different parameters at different locationsin the image. We conducted a user study to gain intuitionon how to adapt parameters to the current view.

4.1. User Study

We ran a study on Amazon Mechanical Turk where weasked users to adjust the brightness, contrast and saturationof a set of 25 images. We repeated the experiment withnew users and another set of 25 images so that we collectedmore data while preventing fatigue and boredom. The setconsisted of 5 crops at various locations and zoom levelsfrom each of 10 different panoramas, for a total of 50 im-ages. We asked the users to adjust the images to obtain apleasing rendition that was “like a postcard: balanced andvibrant, but not unnatural.” Users adjusted brightness, con-trast, and saturation, and the initial positions of the sliderswere randomized. In total, 27 unique users participated inour study. However, some users made random adjustmentsto collect the fee. We pruned these results through an in-dependent study, where different users chose between theoriginal and edited images to select which image in the pairwas more like a postcard. We kept the results of a given userif his images received at least 65% positive votes. After this,20 unique users remained.

To analyze a user’s edits, we converted the input and outputimages into the CIE LCH colorspace. As an initial analy-sis, and inspired by the work on photographic style of Baeet al. [7], we compare the space of lightness histograms be-fore and after editing. We estimate the size of each space bysumming the Earth Mover’s Distance (EMD) [27] betweenall pairs of lightness histograms. If the histogram actuallycharacterizes a user’s preference, we expect the size of thisspace to be smaller after the edits. On average, a user’s editsreduced the size of the histogram space by 46% compared tothe randomized inputs that the user saw, and by 14% com-pared to the original non-randomized images (not seen bythe users), which confirms that the histogram characterizesusers’ preference. We also analyzed the variance in the dis-tance measurements. We found that all users decreased thevariance in histogram distances as compared to the originalimages. These findings suggest that an interpolation schemethat decreases histogram distances is a good model of userpreferences when editing images.

4.2. Propagation of Edits

The goal of edit propagation is to determine a set of param-eters for the current view based on other edits in the image.In the user study, we observed that users tend to make thehistograms of images more similar. Accordingly, our ap-proach seeks parameters that make the current histogramclose to the histograms of nearby edited regions. The fullyzoomed out view always counts as an edited region evenif the user keeps the default settings. If the user does notspecify any edit, our method is fully automatic akin to theKopf’s viewer [24] and uses the zoomed out view as ref-erence. However, the user can specify edits at any timeand out method starts interpolating the user’s edits. Let(xv, yv, sv) be the spatial and scale coordinates of the cur-rent view. We combine the histograms of the k closest ed-its into a target histogram. We use the Earth Mover’s Dis-tance on the image histograms to find these nearest neigh-bors. This metric can be interpreted as a simple scene sim-ilarity that can be computed efficiently unlike more com-plex methods [31]. Drawing from work on texture synthe-sis [29], we interpolate the inverse cumulative distributionfunctions (CDF): C−1t =

∑ki=1 wi C

−1i /

∑ki=1 wi, where

Ct is the target CDF created by linearly combining nearbyCDFs Ci with weights wi. We use inverse distances in his-togram space as weights: wi = 1/d(Hv, Hi)

2 where Hv isthe histogram of the current view, Hi is the unedited his-togram of the i-th neighboring edit, and d(·) is the EMDfunction. Once we have the target inverse CDF, we fit a lin-ear model of brightness and contrast change to best matchthe inverse CDF of the current view C−1v to the target. Thatis, we seek α and β so that αC−1v + β is close to C−1t . Wefound that a least-squares solution overly emphasizes largedifferences in the inverse CDFs and does not account forclipping. We use an iteratively reweighted least-squares al-gorithm with weights γj that are low outisde [0; 1] and that

zoomed out image(437 Mpixels)

edited view edited view

new unedited viewwithout any adjustment

new unedited viewwith our adjustment

inv.CDF

inv.CDF

linear blend

interpolatedinv. CDF

L1 fit

inv.CDF

Figure 3. We infer viewing parameters from nearby edits per-formed by the user. Our scheme linearly interpolates the inverseCDFs of the nearby views and fits brightness and contrast param-eters to approximate the interpolated inverse CDF in the L1 sense.

Page 6: Display-aware Image Editing - People | MIT CSAILpeople.csail.mit.edu/sparis/publi/2011/iccp_display/... · 2011-03-03 · these large images: editing them and viewing them. Edit-ing

decrease the influence of large differences,

γj =

{ε if αC−1v (j) + β 6∈ [0; 1]

1|αC−1

v (j)+β−C−1t (j)| otherwise

(1)where ε = 0.001. If we ignore the weights outside [0; 1],this scheme approximates a L1 minimization [18]. Figure 3and the companion video illustrate our approach.

5. Results

The companion video shows a sample editing session withour display-aware editing prototype. The main advantageof our approach is that editing is interactive. In comparison,seamless cloning using Adobe Photoshop can take severalminutes for large copied region. Because of its slowness,retouching with a tool such as Photoshop is limited to themost critical points and overall, the image is left untouched,as it has been captured. Our approach addresses this issueand makes it easier to explore creative edits and variationssince feedback is instantaneous.

5.1. Complexity Analysis

We analyze the computational complexity of our editing ap-proach by first looking at the cost of fetching the visible datafrom our data structure and then at the editing algorithms.

Preparing the Visible Data For a wdis × hdis display andwtile × htile tiles, the number of tiles that we load is lessthan (wdis/wtile + 1)× (hdis/htile + 1). When we apply ge-ometric transformations to the tiles, these introduce limiteddeformations and can be taken into account with a smallincrease of wtile and htile. Since we have pre-computed thetiles at all 1

2n scales, we load at most four times as many pix-els as needed. Last, we may have several input images butwe do not load any data for the images outside the currentview. Put together, this ensures that we handle an amount ofdata on the order of O(kdis ℵdis) where kdis is the number ofvisible input images and ℵdis = wdis × hdis is the resolutionof the display. With our scheme, loading the visible imagedata has a cost linear with respect to the display size. Thisis important in applications where images are transmitted,e.g., from a photo sharing website to a mobile device.

Editing Operators The per-pixel processes such as theviewing adjustments are in O(ℵdis) since they do a fixedamount of computation for each pixel. Pyramid-based op-erators run the same process for each pyramid coefficient.Since a pyramid has 4/3 times as many pixels as the image,these operators are also linear with respect to the displayresolution ℵdis. The stitching operator processes all the kdisvisible images, which introduces a factor kdis. This ensuresa O(kdis ℵdis) complexity, and since loading the data is also

linear, our entire pipeline has a linear complexity with re-spect to the display size.

5.2. Accurate Results from Low Resolution Only

We verify that our operators commute with the screen op-erator (§ 3.1) by comparing their results computed at fullresolution rescaled to the screen resolution with the resultcomputed directly from the data at screen resolution. Fig-ure 4 shows that our push-pull compositing produces indis-tinguishable results in both cases, that is, we can computethe exact result directly at screen resolution without resort-ing to the full-resolution data. In comparison, the schemeused in Photoshop [19] produces significantly different out-puts. We performed the same test for image stitching us-ing Photoshop and our scheme (§ 3.2). Both produce visu-ally indistinguishable results, however Photoshop is signifi-cantly slower because even its optimized solver [5] becomesslow on large images, e.g. a minute or more for severalhigh-resolution images. In comparison, our scheme runsinteractively and is grounded on a theoretical study (§ 3.1).

5.3. Running Times

We tested our prototype editing system on a Windows PCequipped with an Intel Xeon 3.0 GHz CPU with 16 GBof system memory and an NVIDIA Quadro FX 5800 GPUwith 4 GB of graphics memory. Figure 5 provides the per-formance result of our system. We measured the averageframe rate of the system while applying the global operatorsto the image at arbitrary locations. We gradually change theviewpoint and zoom level during the test to reduce cachememory effect in a realistic setup. Our timings includedata transfers so that we measure the time that a user ac-tually perceives when working with our prototype. Notethat I/O operations are often excluded from the measures ofother methods, e.g. [16]. We tested the operators on fivedifferent screen sizes, from 512 × 384 (0.2 megapixels) to2048×1536 (3 megapixels), and three different size of inputimages, from 0.3 to 16 gigapixels. The result shows the ben-

composite thendownsample

downsample thencomposite

(a) Photoshop (24dB)

composite thendownsample

downsample thencomposite

(b) our method (45dB)

Figure 4. For Photoshop and our approach, we compute a com-posite and then downsample it (shown on the left halves of the im-ages) and compare the output to the composite computed directlyon downsampled data (the right halves). Unlike Photoshop (a), ourmethod generates visually indistinguishable images (b).

Page 7: Display-aware Image Editing - People | MIT CSAILpeople.csail.mit.edu/sparis/publi/2011/iccp_display/... · 2011-03-03 · these large images: editing them and viewing them. Edit-ing

0 0.5 1 1.5 2 2.5 3 3.50

50

100

150

Screen Size (Mpixel)

Fra

me R

ate

Notre Dame (0.3GB)Stanford (2.2GB)Brain (16GB)

Figure 5. Timings of the global operators running on variousscreen and input sizes. The performance of our display-aware al-gorithms depends on the screen size and not on the size of the data.

efit of display-aware editing: the frame rate is not affectedby the input image size (three plots are almost identical inFigure 5) but is highly correlated with the screen size (framerates drop as the screen size increases in Figure 5). Note thatthe 16-gigapixel brain image is much larger than the size ofgraphics memory we used, but the frame rate is similar toa 0.3-gigapixel image. In addition, the construction of theGaussian and Laplacian pyramids for a 1024 × 768 screenresolution took only 11 ms, which enables the executionpyramid-based image operators on-the-fly without using apre-built global image pyramid. Our on-the-fly image reg-istration runs on a fixed-size grid and is highly paralleliz-able, and takes 50 to 100 ms in our prototype implemen-tation. The numbers in Figure 5 show that our algorithmsare fast and that our data management strategy successfullyprevents data starvation.

5.4. Validation of our Interpolation Scheme

We validate our algorithm for propagating viewing param-eters on the user study data described in Section 4.1. Thedata consists of edits from 20 users on 5 views from eachof 10 different panoramas (a total of 50 images). Using aleave-one-out strategy for each panorama, we predict oneview using the user edits from the 4 other views. We usethe Earth Mover’s Distance between the histograms of ourpredicted edit and the user’s actual edit to quantify the ac-curacy of our prediction. On average, the difference is 3.0with a standard deviation of 1.9. We compared our interpo-lation scheme to simply interpolating the users’ brightnessand contrast adjustments between views (i.e., interpolatingthe slider positions instead of the histograms). Compared tothe users’ actual edits, this interpolation scheme producedan average error of 3.7 with a standard deviation of 2.1.A two-sample t-test confirms that our histogram interpola-tion sheme has a lower error than interpolating the adjust-ments with a p-value below 10−8. To put these errors intoperspective, we conducted an second study in which usersedited 20 images comprising 5 images appearing twice and10 distractors. The image order was randomized such thatrepeated images were not back to back. We collected 250repeated measurements and on average, the difference was

0 5 10 150

0.05

0.1

0.15

0.2

Earth Mover’s Distance

Fre

quency

User ConsistencyPrediction Error

Figure 6. Distribution of the differences between users’ edits andour predictions, and between users’ edits on repeated images (seetext for details). The similarity between these distributions indi-cates that our edit propagation reproduces users’ adjustments.

2.8 with a standard deviation of 2.3. This result shows thatour scheme reproduces users’ adjustments within a margincomparable to their own repeatability. Figure 6 illustratesthis point.

6. Conclusions and Future WorkOur display-aware editing approach effectively handles im-ages that otherwise would be difficult and slow to process.A large part of the benefits comes from the fact that we pro-cess only the visible data. When one needs the whole im-age at full resolution, for instance to print a poster, we willhave to touch every single pixel and the running times areslower. Even in those cases our method remains fast sinceour editing algorithms are data parallel. In addition, all ouralgorithms use the same scale-space data structure and ap-ply similar operations to it, which makes data managementand out-of-core processing easier. We envision a workflowin which the user would first edit the image on screen, en-joying the speed of our display-aware approach, and run afinal rendering at the end, just before sending the result toan output device such as a printer.

Although we have shown that we can support a variety oftasks with our display-aware approach, there are a few casesthat are difficult. Optimization-based techniques require toaccess every pixel which makes them overly slow on largeimages. This prevents the use of some algorithms such aserror-tolerant and highly discriminative selections [6, 26].Related to this issue, algorithms akin to histogram equal-ization manipulate every pixel and become unpractical onlarge images. A solution is to apply them at lower resolu-tion and to upsample their results [23]. Nonetheless, de-veloping a display-aware version of these algorithms is aninteresting avenue for future work. We also imagine thatother novel display-aware algorithms will be developed inthe future. In this paper we applied operators simply in spa-tial and temporal orders, but we will investigate further formore complicated editing scenarios and ordering optimiza-tion schemes. Ultimately, processing and data storage aregetting cheaper, making the need for on-the-fly computa-

Page 8: Display-aware Image Editing - People | MIT CSAILpeople.csail.mit.edu/sparis/publi/2011/iccp_display/... · 2011-03-03 · these large images: editing them and viewing them. Edit-ing

tion of large images more pressing. In addition, we envisionthat our framework could be efficiently implemented to edithigh-resolution photographs, e.g., from a digital SLR, oncommodity mobile devices.

Acknowledgments We thank Min H. Kim for his help produc-ing the supplementary video. Insu Yu was in part supported byEPSRC grant EP/E047343/1. Won-Ki Jeong and Hanspeter Pfis-ter were partially supported by the Intel Science and TechnologyCenter for Visual Computing, Nvidia, and the National Instituteof Health under Grant No. HHSN276201000488P. This materialis based upon work supported by the National Science Foundationunder Grant No. 0739255.

References

[1] GigaPan. www.gigapan.org.[2] PixOrb. www.peaceriverstudios.com/pixorb.[3] A. Adams. The Print: Contact Printing and Enlarging. Mor-

gan and Lester, 1950.[4] A. Adams, N. Gelfand, J. Dolson, and M. Levoy. Gaussian

KD-trees for fast high-dimensional filtering. ACM Trans. onGraphics, 28(3), 2009.

[5] A. Agarwala. Efficient gradient-domain compositing usingquadtrees. ACM Trans. on Graphics, 26(3), 2007.

[6] X. An and F. Pellacini. Appprop: All-pairs appearance-spaceedit propagation. ACM Trans. on Graphics, 27(3), 2008.

[7] S. Bae, S. Paris, and F. Durand. Two-scale tone manage-ment for photographic look. ACM Trans. on Graphics, 25(3),2006.

[8] I. Bankman, editor. Handbook of Medical Imaging: Pro-cessing and Analysis Management (Biomedical Engineer-ing). Academic Press, 2000.

[9] C. Barnes, E. Shechtman, A. Finkelstein, and D. B. Gold-man. PatchMatch: A randomized correspondence algorithmfor structural image editing. ACM Trans. on Graphics, 28(3),2009.

[10] D. F. Berman, J. T. Bartell, and D. H. Salesin. Multiresolu-tion painting and compositing. In Proc. of ACM SIGGRAPH,1994.

[11] M. Brown and D. G. Lowe. Automatic panoramic imagestitching using invariant features. Int. Journal of Comp. Vi-sion, 74(1), 2007.

[12] P. J. Burt. Moment images, polynomial fit filters, and theproblem of surface interpolation. In Proc. of Comp. Visionand Pattern Recognition, 1988.

[13] P. J. Burt and E. H. Adelson. A multiresolution spline withapplication to image mosaics. ACM Trans. on Graphics,2(4), 1983.

[14] J. Chen, S. Paris, and F. Durand. Real-time edge-aware im-age processing with the bilateral grid. ACM Trans. on Graph-ics, 26(3), 2007.

[15] A. Criminisi, T. Sharp, C. Rother, and P. Perez. Geodesicimage and video editing. ACM Trans. on Graphics, 2010.

[16] Z. Farbman, G. Hoffer, Y. Lipman, D. Cohen-Or, andD. Lischinski. Coordinates for instant image cloning. ACMTrans. on Graphics, 28(3), 2009.

[17] R. Fattal. Edge-avoiding wavelets and their applications.ACM Trans. on Graphics, 28(3), 2009.

[18] J. Gentle. Matrix algebra, chapter 6.8.1 Solutions that Min-imize Other Norms of the Residuals. Springer, 2007.

[19] T. Georgiev. Covariant derivatives and vision. In Proc. of theEuropean Conf. on Comp. Vision, 2006.

[20] K. J. Hayworth, N. Kasthuri, R. Schalek, and J. W. Licht-man. Automating the collection of ultrathin serial sectionsfor large volume TEM reconstructions. In Microscopy andMicroanalysis, 2006.

[21] W.-K. Jeong, J. Schneider, S. G. Turney, B. E. Faulkner-Jones, D. Meyer, R. Westermann, R. C. Reid, J. Lichtman,and H. Pfister. Interactive histology of large-scale biomedi-cal image stacks. IEEE Trans. on Visualization and Comp.Graphics, 16(6), 2010.

[22] M. Kazhdan and H. Hoppe. Streaming multigrid forgradient-domain operations on large images. ACM Trans.on Graphics, 27(3), 2008.

[23] J. Kopf, M. F. Cohen, D. Lischinski, and M. Uyttendaele.Joint bilateral upsampling. ACM Trans. on Graphics, 26(3),2007.

[24] J. Kopf, M. Uyttendaele, O. Deussen, and M. F. Cohen. Cap-turing and viewing gigapixel images. ACM Trans. on Graph-ics, 26(3), 2007.

[25] A. Levin, A. Zomet, S. Peleg, and Y. Weiss. Seamless imagestitching in the gradient domain. In Proc. of the EuropeanConf. on Comp. Vision, 2006.

[26] Y. Li, J. Sun, C.-K. Tang, and H.-Y. Shum. Lazy snapping.ACM Trans. on Graphics, 23(3), 2004.

[27] H. Ling and K. Okada. An efficient earth mover’s distancealgorithm for robust histogram comparison. IEEE Trans. onPattern Analysis and Machine Intelligence, 29(5), 2007.

[28] J. Liu, J. Sun, and H.-Y. Shum. Paint selection. ACM Trans.on Graphics, 28(3), 2009.

[29] W. Matusik, M. Zwicker, and F. Durand. Texture design us-ing a simplicial complex of morphable textures. ACM Trans.on Graphics, 24(3), 2005.

[30] J. McCann and N. S. Pollard. Real-time gradient-domainpainting. ACM Trans. on Graphics, 27(3), 2008.

[31] A. Oliva and A. Torralba. Modeling the shape of the scene:a holistic representation of the spatial envelope. Int. Journalof Comp. Vision, 42(3), 2001.

[32] S. Paris and F. Durand. A fast approximation of the bilat-eral filter using a signal processing approach. Int. Journal ofComp. Vision, 2009.

[33] P. Perez, M. Gangnet, and A. Blake. Poisson image editing.ACM Trans. on Graphics, 22(3), 2003.

[34] S. Pinheiro and L. Velho. A virtual memory system for real-time visualization of multi-resolution 2D objects. Journal ofWSCG, 2002.

[35] M. A. Shantzis. A model for efficient and flexible imagecomputing. In Proc. of ACM SIGGRAPH, 1994.

[36] L. Velho and K. Perlin. B-spline wavelet paint. Revista deInformatica Teorica e Aplicada, 2002.

[37] S. Wang and W. Heidrich. The design of an inexpensive veryhigh resolution scan camera system. In Proc. of Eurograph-ics, 2004.