Download - Tomorrow’s Photoshop E ects · Image retargeting using Seam Carving is an image resizing technique based on the ... ﬁnally established Photoshop’s success with the introduction

Tomorrow’s Photoshop Effects

Johannes Borodajkewycz∗

TU Wien

(a) (b)

Figure 1: Examples for two of the techniques presented in this paper: (a) Interactive image completion with perspective correction is ableto fill missing regions for high resolution images with the help of user supplied perspective correction. (b) Seam carving for content-awareimage resizing allows users to resize images in a content aware manner.

Abstract

This report aims to give an overview about possible future effectsfor image editing software like Adobe’s Photoshop. It includes thefields of image compositing, image completion, image deforma-tion, gradient based image editing and image resizing. Several newapproaches are described in detail. Soft scissors is a new interac-tive matting approach that lets users generate high quality mattesin realtime. Drag and drop pasting is a gradient based image com-positing method that allows for pasting of roughly selected object.It achieves this by generating an optimal boundary around the ob-ject that avoids as much salient objects as possible. Patch trans-form is an approach to several image modification made possibleby disassembling the image into small, non-overlapping patches.The algorithm is able to rearrange the patches according to specialuser constrains. Image completion with perspective correction isan exemplar based image completion approach that estimates andapplies perspective correction when copying fragments. Detail pre-serving shape deformation is an image deformation approach thatpreserves texture detail by decoupling the deformation process fromthe pixel generation and resynthesizes texture information. Real-time gradient-domain painting lets users directly manipulate gradi-ent similar to an intensity based paint program. Image retargetingusing Seam Carving is an image resizing technique based on theremoval of pixel seams. By removing seams of low energy pixelsthe method is able to preserve the content of the image.

Keywords: Image editing, matting, Image compositing, Poissonimage editing, Image retargeting, Seam-carving, Image completion,Patch transform, Image deformation, Texture synthesis

∗e-mail: [email protected]

1 Introduction

Adobes Photoshop [Adobe Systems 2008] is a well known leadingphoto-processing software. It includes a wide range of standardfilters and effects and includes more and more advanced featuresin recent years. This seminar work will deal with several effectsfor photo-processing that have the potential to be included in futuresoftware products.

Photoshop is the flagship of Adobe Systems’ product line and hasbeen the marked leader for commercial bitmap and image manipu-lation for over a decade.

Photoshop’s history began in 1987 when the two brothers Thomasand John Knoll began development on the first version of their im-age editing tool for the Macintosh Plus. Going by various namesover the years, it was finally licensed by Adobe in 1988 and re-leased as Photoshop 1.0 in 1990. Various improvements were madeand led to the release of Photoshop 2.0 and a Windows and Solarisport released as version 2.5. With the third release in 1994 Adobefinally established Photoshop’s success with the introduction of lay-ers. The introduction of layers became a mayor selling point forPhotoshop among professional artist. The following iterations un-til Photoshop 7 saw continual improvement and new features likethe History Palette for multiple Undos as well as the inclusion ofWeb-specific features due to the bundling with ImageReady. Sinceversion 8 Photoshop is included into Adobe’s Creative Suit (CS)and therefore the number was abandoned in favour of the new namePhotoshop CS and the following CS2 and CS3. New additions in-cluded the possibility to import RAW formats from different digitalcameras and the introduction of Smart Objects and High DynamicRange Image (HDRI) support in CS2. CS3 saw the introductionof Auto-Blend and Auto-Align to create Panorama images with thepopular Photomerge feature. The latest version, Photoshop CS4,

was officially released on the 15th of October 2008 including Con-tent Aware Scaling, which is described in detail in Section 6.2.

Photoshop continues to be one of the most powerful image manip-ulation tools and provides a vast selection of different effects andfunction for most common image editing problems. Photoshop’ssophisticated layer architecture allows users to manipulate separateimages in different layers in order to modify sections without edit-ing the entire image. Additionally Photoshop provides a growingrange of image manipulation tools. Besides the standard drawingtools, like the paint tools for freehand drawing and selection tools,it includes a wide range of filters and image enhancement tools.

In recent years the competition has grown steeper for the mar-ket leader. Free software like GNU Image Manipulation Program(GIMP) [GIMP Development Team 2008] offers similar powerwithout a cost and various lower cost applications aimed at am-ateurs and home users like Paint Shop Pro [Corel 2008] have im-proved their scope of services. Adobe reacted by launching the bud-get priced but feature-reduced Photoshop Elements targeted at thatsame audience. Consequently new developments for Photoshop arewell guarded secrets and one can only guess in which direction fu-ture improvements will go.

To be included in commercial image editing software, new appli-cations must feature meaningful user interaction and compute in areasonable amount of time. Finally new features must be useful anddesired by the user.

Presented in this paper are some approaches and methods that couldenrich the assortment of future Photoshop releases.

Section 2 deals with the field of image completion. The section in-cludes image matting to extract difficult foreground objects, gradi-ent based compositing to seamlessly blend multiple images togetherand a new approach based on the disassembly of images into smallpatches and their automatic rearrangement by an advanced algo-rithm. An overview of the problem behind image matting is givenin section 2.1. Section 2.1.1 details the different methods devel-oped to solve the problem. Section 2.1.2 presents a new interactivematting system called soft scissors [Wang et al. 2007]. In section2.2 the general approach of gradient based compositing is outlinedand the drag and drop pasting [Jia et al. 2006] is explained in de-tail in section 2.2.1. Section 2.3 is dedicated to a noval approach toseveral different image modifications called patch transform [Choet al. 2008].

Section 3 looks at the problem of image completion with the relatedfield of texture synthesis explained in section 3.1 and an overviewof different approaches to solve the problem in section 3.2. One ofthese approaches called interactive image completion with perspec-tive correction [Pavic; et al. 2006] is reviewed in detail in section3.2.1.

Section 4 deals with image deformation and in section 4.1 coversdetail preserving shape deformation [Fang and Hart 2007] which isable to faithfully preserve texture detail by decoupling the deforma-tion of features from pixel color generation.

Section 5 takes another look at gradient based image editing af-fected by the new interactive real-time gradient-domain paintingsystem [McCann and Pollard 2008] which makes the direct manip-ulation of image gradients possible. A detailed look at the systemis given in section 5.1.

Section 6 deals with the new demands to image resizing and theproblems of traditional approaches to accurately preserve the con-tent of images in section 6.1. Finally, section 6.2 presents the newimage retargeting using seam carving [Avidan and Shamir 2007]and its capabilities.

2 Image Compositing

Image compositing is a collective term for a vast number of meth-ods that digitally assemble one or multiple images into a new com-posite image. Photoshop already features a powerful image com-positing method with layer masks. By applying a layer mask to aforeground layer the user can adjust the opacity of that layer. Yet agood layer mask is hand painted and takes even an adept Photoshopuser some time and effort to generate. It is even more challengingwhen the boundary of the object is difficult to trace, like for furor hair. For the special task of extracting such objects image mat-ting techniques have been developed. Section 2.1 presents the gen-eral problem of image matting and Section 2.1.1 gives an overviewof some of these techniques. A new interactive matting algorithmcalled soft scissors [Wang et al. 2007] is detailed in Section 2.1.2.

Another Photoshop tool for image compositing is the GradientLayer. This Layer can be used to softly blend multiple imagestogether. In combination with the Layer mask the gradient cansmoothly fade out objects. Yet those tasks might be challengingfor an untrained user a. Since CS3 Photoshop includes the Auto-Blend Layer which can seamlessly blend together two overlappingimages, simplifying the process. These two features work similarto the gradient based image editing detailed in Section 2.2. and thedrag and drop pasting technique [Jia et al. 2006] that lets users se-lect and past objects into new backgrounds with relative ease. Sec-tion 2.3 introduces another novel approach called Patch Transform[Cho et al. 2008] that breaks the image into small non-overlappingpatches and lets the user manipulate the image content in this patchdomain.

2.1 Image Matting

Image matting is the process of extracting an accurate foregroundobject from an image. Specifically, an input image C is a convexcombination of a foreground image F and a background image B.

Cp = αpFp +(1−αp)Bp, (1)

where p is the pixel location and αp is the foreground opacity ofthe pixel. Alpha is one for opaque foreground and zero for com-pletely transparent foreground.) In blue screen matting B is knownby working in a user controlled environment but in natural imagematting the problem is severely under-constrained. All quantitieson the right hand side of the compositing equation are unknownwhich leads to three equations, for the three color channels, withseven unknowns. However the strong correlation between neigh-bouring pixels can be used to mitigate these difficulties. Addition-ally further information like multiple backgrounds or flash and non-flash image pairs can be used to help determine the unknowns in thematte equation. However this paper will focus on approaches thatsolve the matting problem from a single input image.

Most of these matting techniques depend on user interaction to sup-ply a trimap. A trimap is a rough segmentation of the image and de-fines for each pixel whether it definitely belongs to the foreground,definitely belongs to the background or is unknown. Informationfrom the foreground and background are then used to determine theunknown regions. If the results turn out unsatisfactory the trimaphas to be improved and the algorithm is executed again. Howeverrecomputing the whole matte at each iteration could feel quite inef-fective to the user.

2.1.1 Matting Techniques

Based on how they apply natural image statistics most matting tech-niques can be classified into two rough categories. Sampling-basedapproaches and propagation based approaches. Methods that ex-plicitly estimate the color of unknown pixels by sampling represen-tative foreground and background pixels in the vicinity are summa-rized as sampling-based approaches. They use these color samplesto directly estimate the alpha values.

The established Bayesian Matting [Chuang et al. 2001] first iden-tifies the local color distribution by constructing oriented Gaussiandistributions from F , B and α values, starting from the known fore-ground and background regions. A maximum a posterior (MAP)estimation is then calculated in a well-defined Bayesian framework,estimating unknown F , B and α as the most probable ones for thegiven distribution. This method works well with a smaller unknownregion in the trimap and non-overlapping color distribution of theforeground and background. Yet given an image with sparse con-straints the Bayesian method produces an erroneous matte.

Another example is the more recently proposed Iterative Optimiza-tion Approach for Unified Image Segmentation and Matting [Wangand Cohen 2005]. This approach argues that the construction of atrimap becomes increasingly difficult for images that contain largesemi-transparent regions or foreground objects with many holes.They propose an approach that forgoes the pre-segmentation ofthe image. Instead they combine the segmentation and the mat-ting problem together and propose a unified optimization approachbased on Belief Propagation. The user provides a small sample ofdefinite foreground and background pixels from which the opacityvalues of every pixel are estimated iteratively.

Pixels are divided into two groups Uc for pixels which alpha val-ues have been estimated in previous iterations and Un for pixels yetunconsidered. Each pixel is also assigned an uncertainty factor be-tween 0 and 1. During each iteration pixels from Un near to pixelsfrom Uc are added to Uc and their values estimated. The algorithmstops when Un is empty and the uncertainty of the image cannotbe reduced any further. Essentially, by estimating alpha values forpixels near the user marked ones first, these pixels can then helpto estimate alpha values for pixels which are further away from themarked regions. This way the alpha map is propagated from pixelswith high confidence to the rest of the image. Furthermore the con-strains of smoothness over the alpha matte and color accuracy areapplied as optimization objectives.

(a) (b)

Figure 2: Belief Propagation Matting: A matte produced for a spi-derweb. (a) Original image with a few user marked regions. (b) Thefinished matte.

Ultimately the approach shows good results for objects like a spi-dernet shown in Figure 2, yet fails for images where foreground andbackground colors are too ambiguous.

Propagation based methods on the other hand do not explicitly es-timate foreground and background colors. They look for ways to

systematically eliminate foreground and background colors fromthe optimization process and solve the matte in a closed form. Mostcommonly by applying smoothness constraints over the foregroundand background colors.

Poisson Matting [Sun et al. 2004] assumes that intensity change inthe foreground and background are smooth. Given a user suppliedtrimap an approximate gradient field of the matte is computed fromthe input image. The matte then is obtained by solving Poissonequations. Additional user refinement to the matte is then possibleby applying local Poisson matting, which manipulates a continuousgradient field in a local region. Local modifications in a gradientfield can be seamlessly propagated into the matte. However Pois-son matting has difficulties to produce good results when the fore-ground and the background colors are very similar. Additionally,difficult situations require increased user interaction to improve thematte.

A closed form solution to natural image matting as presented byLevin et al. [Levin et al. 2008] assumes smoothness on foregroundand background colors. Implying that foreground and backgroundcolors are approximately constant over a small window around eachpixel, they attain a sparse linear system. F and B are analyticallyeliminated from this system yielding a quadratic cost function inα . The global optimum of this function is the wanted alpha matte.Closed Form matting does not need a sophisticated trimap and man-ages with just a small amount of user input.

2.1.2 Soft Scissors

Most of the more recent matting algorithms mainly improve thequality of the matte by more sophisticated analysis and optimiza-tion methods. Soft scissors [Wang et al. 2007] on the other hand isthe first tool that introduces an interactive way of extracting alphamattes in realtime. The system is very fast, the example shown inFigure 3 took about 40 seconds from generating the matte to thefinal composite. Aided by an intelligent interface similar to a brushtool, users are able to construct the mattes by roughly painting alongthe edge of the foreground object. The system automatically adjuststhe width and boundary conditions of the scissoring brush depend-ing on the width of the foreground boundary along the path ahead.

Building on their own offline robust matting algorithm [Wang andCohen 2007], which combined the color sampling method withpropagation based approaches, Wang and Cohen adapt it to provideincremental matte estimation as well as incremental foregroundcolor estimation. The system assumes that a stroke with the scis-soring tool made by the user implicitly defines a trimap as shown inthe illustration in Figure 5. The left edge of the stroke is assumedto lie in the background while the right edge is assumed to lie inthe foreground. The middle of the stroke represents the unknownregion.

As the user paints along the edges, on each iteration, the systemgenerates a new input region Mt from the newly painted pixels.The newly marked background and foreground pixels are addedto the foreground and background color examples as well as newboundary conditions for the local input area. The newly markedunknown pixels are assumed to affect all of the correlated pixelsthat where previously marked as unknown. An update-region solverthen computes the small region for which the alpha values need tobe updated. The alpha values of pixels are estimated by the ro-bust matting algorithm for all pixels inside the matting region Ωt .These newly computed alpha values are finally used by the fore-ground color solver to estimate the foreground color of pixels in Ωt .This step removes the old background colors from the foregroundto ready it for composing with a new background and improves the

Figure 3: Soft Scissors: (a) Generating the matte by tracing the boundary of the object. (b) The noval composite. (c) The object with the newbackground.

Figure 4: Soft Scissors: Flowchart of the system

Figure 5: Soft Scissors: The trimap of the soft scissors system gen-erated along the user stroke.

quality of the matte. The flowchart of the System is shown in Fig-ure 4. The matte, foreground color and update region are solved assoft graph labeling problems with graph structures.

The graph for solving the matte as shown in figure 7(a) is con-structed with virtual nodes Ω f and Ωb which represent pure fore-ground and background respectively. The rest of the nodes eitherrepresent unknown pixels or boundary nodes. The alpha valuesfor boundary nodes are fixed over one iteration and include notonly user marked foreground and background pixels but also un-known pixels along the boundary of Ωt whose alpha value has beenestimated in previous iterations. This guaranties that the matteis smooth across the entire boundary of Ωt . Based on a non-parametric models of color distribution, data weights are assignedbetween pixel i and the virtual nodes. This way pixels similar toforeground (or background) colors are more likely to receive higher

(or lower) alpha values. Additionally edge weights are assigned be-tween neighbouring pixels which constrain nearby pixels to havesimilar alpha values. The formulation is based on closed form mat-ting [Levin et al. 2008]. An edge weight Wi, j, which constrainsnearby pixels to have similar alpha values, is assigned betweeneach pair of neighbouring pixels i and j. Each unknown pixel isconnected to its 25 spatial neighbours.

The graph labeling problem of the constructed graph is then solvedusing the Random Walk algorithm [Grady 2006]. The alpha val-ues are computed by placing a random walker at pixel i which canreach any connected node j with the probability Wi, j/∑ j Wi, j untilit reaches one of the boundary nodes. By solving the linear systemproposed by Grady [Grady 2006] the alpha value of pixel i is deter-mined as the probability that the walker ends up at the foregroundvirtual node.

Soft scissors also estimates the true foreground color F for eachpixel in the unknown region. This also optimizes the matte by re-moving artefacts generated during the matte estimation. This canbe seen in Figure 6.

Figure 6: Soft Scissors: The estimated foreground colors after thematte estimation and after the true foreground color estimation.

A second graph, as shown in Figure 7(b), is constructed and solved

using Random Walk for the three foreground color channels in-dividually. Only pixels in Ωt whose alpha value is between 0and 1 are treated as unknown pixels connected to their 4 spatialneighbours. Edges are assigned an color edge weight as Wi, j =|ai−a j|+ε , where ε is a small value ensuring the weight is greaterthan zero. The boundary pixels in this step are either foregroundpixels or background pixels. The foreground pixels use their truecolors as boundary conditions whereas the background pixels usetheir initially estimated foreground colors in the matte estimationstep as boundary conditions.

To compute the matte incrementally soft scissors introduces the up-date region solver. With each new input region an update region Ωtis computed from the pixels that might be affected by the new in-formation. This is solved as yet another graph labeling problem asshown in Figure 7(c). Newly marked pixels of the current iterationare treated as boundary pixels with a label of 1. The label now rep-resents the impact of the new input region on the pixel. Previouslymarked pixels are treated as unknown. As defined in the matte es-timation step the graph consists of each pixel connected to its 25spatial neighbours and the same edge weights Wi, j. By solving thegraph each pixel is assigned a label measuring how much impactthe new region has on it. All pixels with non-zero labels then formthe new matting region Ωt .

Figure 7: Soft Scissors: The graphs used to solve (a) the matte, (b)foreground colors, (c) the update region.

The soft scissors Interface contains the brush with which the usermarks the edges of the foreground object. Both the width of thebrush and the boundary conditions for the trimap, defined by thebrush stroke, are adjusted automatically according to local statisticsnear the current brush stroke.

For fuzzy objects a wider brush is necessary while sharper edgesdemand a narrow one. Brush width is automatically determined ateach time t by creating a wide “look ahead” region, shown in pur-ple in figure 8(a), following the path defined by the current scissordirection. An alpha estimation is conducted for all pixels in thisregion. To estimate the matte profile pixels along the lines perpen-dicular to the current brush direction are sampled. The width of thebrush is adjusted to cover all pixels with fractional alpha estimatesas shown in Figure 8(b). At each sampled point a weight wp iscomputed as wp = 0.5−|αp−0.5|, where α is the estimated alphavalue of the point. The centre of the line distribution is computedas x = ∑p wpxp/∑p wp, where xp is the distance from a samplingpoint to the extended scissor path. The final width of the alpha

profile is estimated as 4√

∑p wp(xp− x)2/∑p wp.

The boundary conditions of the scissor brush strokes are initiallyset so that the left edge of the brush is in the background regionand the right side in the foreground. However users might have tofollow thin structures, which would place both edges in the back-ground region. The system also has to identify if the stroke direc-tion is reversed and switch the boundary conditions accordingly. To

automatically determine the boundary conditions a color model iscreated from foreground and background colors, after the user hasmade a short stroke under the initial assumptions.

Gaussian Mixture Models (GMM) are generated for both fore-ground and background colors and for each new brush position thecolor of the pixels at the edge of the brush are compared to themodel. Whether the average color is closer to the foreground or thebackground GMM determines the boundary condition. The distri-bution of the GMMs is shown in Figure 8(a) and an example for fol-lowing a thin structure where the boundary conditions have to be setto background on both edges of the brush is shown in Figure 8(c).Of course the GMMs are updated periodically and re-computed af-ter a set number of samples.

Figure 8: Soft Scissors: (a) The system automatically adjusts brushwidth and boundary conditions. (b) Example for enlarged brushwidth. (c) Example of changing the boundary conditions.

The automatic adjustments are constantly monitored and if foundunreliably are turned off. In addition users retain full control at alltimes and can set parameters manually.

The estimations of Wang et al. show that soft scissors not only hasimproved usability compared to other matting algorithms but alsothat the quality of the mattes is among the best.

2.2 Gradient Based Compositing

Working in the gradient domain has become a favoured techniqueof many image editing algorithms. Instead of working with the ab-solute colors of the images, the color gradients between pixels areused to form a composite vector field. By solving Poisson equationsa best-fit composite image can be reconstructed from this vectorfield. One of the first to use a system of solving Poisson equationsfor seamless editing of image regions were Perez et al. who pro-posed Poisson Image Editing [Perez et al. 2003].

While good matting techniques are able to extract even complicatedobjects from a natural image and composing them with new back-grounds, they do not take into account how to adapt them to a newbackground with different lighting conditions. Moreover if shad-ows and parts of the background are to be part of the new com-posite a different form of marking objects other than matting mightbe preferred. Poisson Image Editing allows users to bring wholeregions of interest into a target image and seamlessly blends thecolors from both images without visible discontinuities around theborders. However the effectiveness depends heavily on how precisethe boundary of the region was drawn by the user. Analysing theselimitation Jia et al. have recently proposed an improvement of thetechnique called Drag and Drop Pasting [Jia et al. 2006].

(a) (b) (c) (d) (e)

Figure 9: Drag and Drop Pasting: (a) The source image. (b) The user drawn boundary, roughly selecting the Object of Interest. (c) Theresult of Poisson image editing shows some unsatisfactory artefacts. (d) the optimized boundary created by Drag and Drop Pasting. (e) Theresulting composite preserving salient regions

2.2.1 Drag And Drop Pasting

Drag and Drop pasting presents a method to optimize the boundarycondition before applying Poisson image editing to combine thetarget and source images. The goal of the optimized boundary is toavoid as much salient image structures in both images as possible.The optimization process assumes that the optimal boundary canbe found between the region of interest the user casually draws Ω0and the actual object of interest inside that region ΩOb j. The differ-ence between the results of Poisson image editing and the improvedboundary generated by drag and drop pasting is shown in Figure 9.

Pasting a region of interest from the source image fs to the targetimage ft is done by solving the minimization problem in equation(2) using the guidance field v = ∇ fs given the user defined boundaryΩ0:

minf

∫p∈Ω0

|∇ f − v|2d p with f |∂Ω0= ft |∂Ω0

, (2)

(a) (b)

Figure 10: Drag and Drop Pasting: (a) The different regions used tocreate the optimal boundary. The Cut C to create a genus-0 regionneeded to compute a shortest path . (b)The shortest paths from oneend of the cut to the other.

where f is the resulting image and ∂Ω0 the exterior boundary ofΩ0. They solve by first obtaining the associated Laplacian equationfor f ′ = f − fs using the boundary condition ( ft− fs)|∂ Ω0 and thenadding back the original source image fs. The boundary conditiondetermines the final result. Therefore the less variation among thepixel colors along the boundary the better the resulting composite.Hence an optimal boundary ∂Ω is color-smooth. As mentionedearlier the optimal boundary has to be somewhere inside the user

marked region of interest Ω0 and outside the object of interest ΩOb j.For most of their results Drag and Drop pasting employs GrabCut[Rother et al. 2004] to automatically find ΩOb j from a given Ω0.To optimize the color variance along the boundary the followingfunction is minimized:

E(∂Ω,k) = ∑p∈∂Ω

(( ft(p)− fs(p))− k)2, ΩOb j ⊂Ω⊂Ω0 (3)

where k is a constant value to be determined.

The optimized border is computed iteratively by initializing Ω asΩ0 and computing k as the average color difference along the bor-der. Given the current k the boundary ∂Ω is improved. The al-gorithm stops when two successive iterations do not improve theenergy of the equation any further. The boundary optimization isequal to finding the shortest path in a graph constructed with thepixels in the Ω0\ΩOb j region as nodes. Edges represent the re-lationship to the four neighbouring pixels. A cost function withrespect to k is applied to each node so that each path through thegraph has the accumulated cost of all its nodes. Since the boundary∂Ω is a closed curve, the region Ω0\ΩOb j is cut at its most nar-row section to get a traceable path as shown in Figure 10(a). In theaccording graph all edges connecting pixels on either sides of thecut are removed. Now the shortest path between all pixels on oneside of the cut to the pixels on the other side is computed using 2Ddynamic programming as illustrated in Figure 10(b). The path withthe globally minimum cost defines the optimal boundary.

Since the optimized boundary might intersect with fine structures ofthe object of interest it is possible to include alpha matting into theprocess by locating the regions of the border where alpha blendingshould be applied. To do that a binary coverage mask M is definedwhich indicates where alpha blending should be applied. The al-pha matte is integrated into the blended guidance field during thePoisson equation. The alpha matte is only applied to regions wherethe boundary intersects the belt and matte compositing should beapplied. This is illustrated in Figure 11.

As shown with Drag and Drop Pasting, working in the gradientdomain allows for seamless composition when inserting an objectwith complex outlines into a new background. Since image compo-sition can also be done by assembling multiple image patches frommultiple images, the same approach can be used for seamless im-age stitching. Photoshop’s own photomerge application uses gradi-ent domain compositing to create panorama pictures from multipleimages.

Figure 11: Drag and Drop Pasting: (a)(b) Boundary created by theuser and estimated Region of interest. (c)The optimized boundarycuts the fine objects on top of the flower. (d) The problematic area.(e) An alpha matte is applied to the affected region. (f) The im-proved composite preserves the fine structures

A similar effect is also achievable with Photoshop’s Auto-Blendfunction. Yet the user has to manually cut out a region in the Layerhe wants to paste the object. The edges overlap to blend them to-gether.

2.3 Patch Transform

Cho et al. propose a novel approach for various image modifica-tions called Patch Transform [Cho et al. 2008]. The basic idea is tobreak a given image into small, non-overlapping patches and applypossible modifications and constrains in this “patch domain”. Themodified image is then reconstructed from these patches similar tosolving a jigsaw puzzle. Possible user constrains include the spa-tial location of patches, the pool of patches from which the pictureis reconstructed and the overall size of the image. Thereby imagemanipulations like repositioning objects or removing textures arepossible. By setting both a position of some patches and the imagesize, image retargeting, of similar quality to the technique describedin section 6.2, is possible. Additionally by combining patches frommultiple images a collage combining objects of these images is cre-ated. Essentially the user only has to formulate his constrains whilethe image adjusts itself accordingly.

To reconstruct the image from the “patch domain” the patches mustall fit together producing a plausible image with minimal artefacts.Also each patch should only be used once. This is the main differ-ence to texture synthesis approaches. The resulting image is a truereconstruction of the existing patches, whereas texture synthesis al-lows single patches to be used multiple times.

After the user has set his constrains or modified patch statistics an“inverse patch transform” is performed. For this transform a prob-ability for all possible patch combinations is computed to piece thepatches together in a meaningful way while maintaining the userconstraints on patch position. This is done by defining terms in aMarkov network. In the Markov Random Field each node repre-sents a spatial position of a patch with index xi. The compatibility

between patch k and patch l is defined as ψi, j(k, l). Where j keepstrack of which of the four possible neighbours is referred to, x isthe vector of the unknown patch indices xi at each of the n imagepositions i. Additionally to enforce that each patch is only usedonce the exclusion function E(x) is defined as zero if any two ele-ments of x are the same and one otherwise. The user’s constraintson patch position are considered trough the local evidence termsφi(xi). This leads to the equation for the probability of an assign-ment x of patches to image location:

P(x) =1Z ∏

iφi(xi) ∏

i, j(i)ψi, j(xi,x j)E(x) (4)

Compatibility among patches is defined as ψi, j(k, l) = ψAi, j(k, l)+

ψBi, j(k, l) whereas ψA

i, j(k, l) is the natural image prior score andψB

i, j(k, l) is the color difference prior score for patches k and l beingin the relative relationship of positions i and j.

For computing ψAi, j(k, l) the filters of the Gaussian Scale Mixture

Field of Experts model (GSMFOE) [Roth and Black 2005], [Weissand Freeman 2007] is applied in the following equation:

ψAi, j(k, l) =

1Z ∏

l,m

J

∑q=1

πq

σqexp(−wT

l xm(k, l))

(5)

where x(k, l) is the luminance component at the boundary ofpatches (k, l), σq, πq are GSMFOE parameters and wl are thelearned filters described in the model.

The score for the color difference ψBi, j(k, l) was found to improve

the results and is computed with the following equation:

ψBi, j(k, l) ∝ exp

(− (r(k)− r(l))2

σ2clr

)(6)

where r(.) is the color along the corresponding boundary and σclris fixed as 0.2 after cross validation. The compatibility score iscomputed for all four possible spatial arrangements of all possiblepairs of patches for the image.

The assignment of x that maximizes P(x) in equation (4) is foundusing Belief Propagation. This approximate method is an exact in-ference algorithm for Markov networks without loops, but can givegood results in some networks with loops [Yedidia et al. 2003]. AsBelief Propagation can settle at local minimal, the patch transformis run multiple times with random initial seeds and the user canselect the best-looking result. Additionally visible seams betweenpatches can be suppressed by using the Poisson equation [Perezet al. 2003].

(a) (b)

Figure 12: Patch Transform: The method works well even withcomplex backgrounds. (a) The original image. (b) The inversepatch transform result.

(a) (b) (c)

Figure 13: Patch Transform: Example of how the patch transform can be used to reorganize objects in an image. (a) the original image. (b)The inverse patch transform result. (c) Another result showing the robustness of the framework to the size of the bounding box.

(a) (b) (c) (d)

Figure 14: Patch Transform: Combining two images. (a), (b) The original images. (c) The combined image, although with some colorbleeding of the foreground snow. (d) The distribution of patches from image (a) yellow and image (b) green.

Patch transform allows for a number of different image manipula-tions. The user only issues constraints to the patch statistics andthe patch transform generates an image according to these requests.The user constrains are represented in equation (4) by the local ev-idence terms.

To reorganize objects in an image the user defines a bounding boxto selects a region he wants to move and the desired location. Theimage is reorganized to incorporate the constraints while keepingthe content intact according to the specified local evidence shown inFigure 12. The algorithm is robust to the size of the region the userselects as long as it is distinguishable as illustrated in Figure 13.Additionally users can manipulate the patch statistics of an imageand thereby decide how many patches of a certain class are used toreconstruct the image shown in Figure 15.

(a) (b) (c)

Figure 15: Patch Transform: User manipulated patch statistics. (a)The original image. The tree was moved to the right. (b) Withconstrains to use less sky patches. (b) With constrains to use lesscloud patches

The patch transform can also be used to retarget images. This canbe seen in the example in Figure 16. Therefore the size of theoverall output is changed without changing the size of any patch.

This works because the local compatibility term tries to simplycrop the image while the local evidence term tries to preserve in-formation. The resulting image is a balance of the two. Comparedto Seam Carving [Avidan and Shamir 2007], patch transform bet-ter preserves the global proportions at the cost of the salient localstructures.

(a) (b)

Figure 16: Patch Transform: An example of resizing an image. (a)The original image. (b) The inverse patch transform result.

Finally patch transfer can combine two or more images by mixingthe patches. Therefore the local evidence is kept uniform for allimage nodes other than the ones within the bounding boxes. So thealgorithm can determine the optimal structure of patches to generatea plausible image as shown in Figure 14.

The implementation is reported to be still relatively unoptimizedand takes about 10 minutes for the compatibility computation andan additional 3 minutes for the belief propagation. For most of theexamples belief propagation was run 5 times and choosing the bestlooking result. While the framework works especially well withtextured or regular backgrounds like natural scenes or grid-type,input images that lack structure can lead to failure cases. Further-more when the user defined constraints and patches contain struc-

tures that cannot be reorganized into natural looking structures it isimpossible for the framework to generate a plausible image.

The main limitations of the current framework are that the controlover patch position is limited by the size of the patch and the largeamount of computation. These problems must be solved before in-teractive image editing using the patch transform will become fea-sible.

3 Image Completion

While the last section mainly focused on extracting foreground ob-jects and composing them with new backgrounds this section dealswith techniques for removing objects from the background and fill-ing the resulting holes. For example, to remove damaged areas inan old photograph or that annoying power tower that destroys thebeautiful landscape. Such techniques are summarized under the de-notion of image completion. Image completion is the process offilling missing parts in an image in a way that the result appearsvisually plausible. Image completion can be useful in many ar-eas like computer graphics applications, image editing, film post-production and image restoration.

3.1 Texture Synthesis

Texture synthesis is the process of constructing a larger image froma small digital sample image by taking advantage of its structuralcontent. The problems of image completion and texture synthe-sis go hand in hand as the the texture to fill the missing part mustbe synthesized. The source can be either internal, from the sameimage, or external, from different images or data sources. Imageinpainting, also called image interpolation in digital image edit-ing, is a method for repairing damaged pictures or removing un-necessary elements. It is mainly used where the missing parts arethin and elongated regions like scratches in old photographs or textlike subtitles or date stamps. Consequently most image comple-tion methods include solutions to both texture synthesis and imageinpainting.

3.2 Image Completion Techniques

Effective image Completion techniques should be able to not onlycomplete complex natural images but also handle incomplete im-ages with large missing parts. Techniques that deal with the imagecompletion problem can be differentiated by following one of threemain approaches.

Statistical-based methods try to statistically describe an input tex-ture by applying some compact parametric statistical model. Thesestatistics are then used to synthesize a new texture by perturbing anoutput image of pure noise until its statistics match the estimatedstatistics of the input texture. Yet these methods are primarily usefulfor texture synthesis and not the general problem of image comple-tion. They are only able to create textures that are mainly stochasticand not textures that contain structure. Therefore this models aremainly used to synthesize textures such as water, smoke or fire.These methods are still very useful in the process of analysing tex-tures.

PDE-based methods are basically image inpainting methods. Theyemploy a diffusion process to fill the missing parts in an image.They smoothly propagate information from the boundary towardsthe interior of the missing region by solving a partial differential

equation. Used primarily for image inpainting they oversmooth forlarger missing regions.

Exemplar-based methods fill a missing region by copying contentfrom an observed part of the image. This can either be done perpixel or patch based. Patch based methods achieve better results bymaintaining higher order statistics of the input image. Used mainlyfor texture synthesis these methods can also be extended for imagecompletion. Problems can occur during this method if a greedyalgorithm is used. Once a patch has been assigned to a missingregion, it cannot be changed. These methods basically do what theclone brush does in Photoshop, but without the need for humanprecision when manually aligning the cloned region. Most recentimage completion approaches use this method.

Sun et al. introduced Image completion with Structure Propagation[Sun et al. 2005]. In this approach the user manually extends a fewcurves or line segments from the known to the unknown regionsto specify missing structures. Image patches are then synthesizedalong these curves in the unknown region using patches selectedaround the same curve in the known region. Structure propagationis formulated as a global optimization problem by enforcing struc-ture and consistency constraints and solved using either DynamicProgramming or Belief Propagation. After that the remaining un-known regions are filled with patch-based texture synthesis.

Komodakis et al. propose an exemplar-based approach for imagecompletion called Image Completion using Efficient Belief Propa-gation via Priority Scheduling and Dynamic Pruning [Komodakisand Tziritas 2007]. Their approach needs no user interaction andthey define image completion as a discrete global optimizationproblem with a well defined objective function which correspondsto the energy of a discrete Markov Random Field (MRF). Theyavoid the greedy patch assignment by allowing each block of miss-ing pixels to change its assigned patch from a pool of source patchcandidates. Furthermore they introduce two improvements, calledPriority-based message scheduling and Dynamic label pruning, tostandard belief propagation which bring a dramatic cost reductionfor a huge number of existing labels.

However most approaches do not consider the problem of perspec-tive distortion and show only low resolution examples.

3.2.1 Interactive Image Completion with Perspective Cor-rection

Another approach to the exemplar-based method is Interactive Im-age completion with perspective correction by Pavic et al. [Pavic;et al. 2006]. They designed an interactive system which promotesuser input. One of the innovations of their approach is the consider-ation of perspective distortion when generating textures. By utiliz-ing information about the approximate 3D structure in a scene theyare able to apply perspective correction before copying patches.With the general assumption of fragment-based methods, that thescene within one fragment is planar, the input regions are rectifiedby applying projective transforms with the help of user input. Rec-tification refers to a 2D transform which aligns arbitrary 3D planesto the image plane. By performing the completion in the rectifiedimage the system is able to copy fragments that would not match inthe original image due to perspective distortion. The system sup-ports multiple image buffers as source buffers S where it looks forfragments for the target buffer T . These buffers can also come frommultiple images. Each rectified image serves as an additional imagebuffer.

During the specification phase, the user specifies both a region to bereplaced and a set of 3D planes that guide the rectification process.

(a) (b) (c)

Figure 17: Image Completion with Perspective Correction: (a) The input image. (b) The quad-grid to approximate the curved surface. Thewhite region marks the protected features of the image, where the algorithm looks for fragments. (c) The completed structure that would beimpossible without perspective correction

(a) (b) (c)

Figure 18: Image Completion with Perspective Correction: (a) Userdrawn rectangles over an distorted image. (b) The rectified Image.(c) The quad is not required to bound the object to define the recti-fication

The user can further restrict the source fragment search to preservefeatures similar to Structure Propagation image [Sun et al. 2005].During the completion process the user selects target fragment lo-cation and size and the system incrementally fills the unknown re-gion with the best fitting source fragment. The system works withcircular fragments due to their rotational invariance. Searching isFFT-based which allows for fast response times even for large im-age buffers.

The interaction system for the user is based on drawing an rectangledirectly into the image and aligning it to a region that corresponds toa rectangular plane object. With this information a projective trans-formation of the plane is realised. Even though 3D information isexploited the transform is 2D, a homography, which is representedas a 3x3 matrix in homogeneous coordinates. How the surface isrectified with the help of the user defined rectangles is shown inFigure 18.

(a) (b)

Figure 19: Image Completion with Perspective Correction: (a) Thequad-grid interaction metaphor used for complex shapes. (b) therectified image. The blue area marks user defined features.

For more complex objects like curved surfaces this interactionmetaphor is extended to a grid of quads Qi, j. Whereas each quad inthe grid defines a homography to be rectified. This leads to an un-folded orthogonal view of the curved surface as shown in Figure 19.To ensure a continuous deformation of the input image the user is

further guided by a snapping mechanism which adjusts the user de-fined grid so that homographies of neighbouring quads are continu-ous along the common boundary edge. Homographies are identicalalong a line if they coincide at a minimum of three points along thatline. As two neighbouring homographies share their endpoints pi, jand pi+1, j the inconsistency can be measured by the discrepancy ofthe mid point of the mappings. The snapping mechanism adjuststhe vertices pi, j in image space by computing the mid point devia-tion of the inverse homography H−1. The sum of these deviationsfor a given grid point pi, j and all its adjacent edges defines a qual-ity score for this vertex. Due to low computation cost the algorithmnow tries for each vertex except the boundary vertices, if moving itin one of the 8 principal directions improves the local quality. Fig-ure 21 shows the result of the unfolded view before and after therelaxation process.

Figure 21: Image Completion with Perspective Correction: (a) Thequad- grid as constructed by the user. (b) The piecewise homo-graphies cause discontinuities at the common boundaries. (c) Theimproved grid after the relaxation procedure (d) The improved un-folded orthogonal view.

The image completion is user guided as well. The user clicks on acertain pixel location in the unknown region with an adjusted frag-ment size and the best fitting valid fragment is applied. As the FFT-based matching procedure is extremely fast the system also checksif pixel locations in the close vicinity of the clicked pixel wouldprovide a superior fragment match. If so the system automaticallysnaps to that pixel.

For the completion process valid image fragments have to be found.Potential source fragments are valid if they are sufficiently far awayfrom the unknown region. For circular fragments this is two times

Figure 20: Image Completion with Perspective Correction: It is possible to combine information from multiple images (a), (d) The inputimages. (b) The user input for image (a). (c) The completed image. (e) The information from image (c) and (d) are used to create the finalimage (f).

the radius pixels away from the source buffer boundary. Addition-ally potential fragments shall not overlap with the unknown region.This is checked by setting the α matte of the source buffer to 1in the unknown regions and 0 everywhere else. Further a secondbuffer is established with zero entries everywhere except for pixelsthat belong to the interior of the flipped and padded target frag-ment. A convolution operation between these two buffers yieldszero everywhere except the positions where the fragments overlapthe unknown region.

A feature of the image is defined as an arbitrary set of pixels thatcan consist of several connected components. Users are able todefine features by drawing lines, freehand curves or marking en-tire regions. During source fragment search the two-sided featuredistance measure proposed in Structure Propagation Image Com-pletion [Sun et al. 2005] is evaluated by integrating the distance ofall target feature pixels to the source features and vice versa. Allsource fragments whose two-sided feature distance measure (nor-malized by the number of feature pixels) exceeds a set tolerance hare discarded. This guarantees that the features are drawn continu-ously from the known to the unknown region.

To improve the quality and reduce the appearance of artefacts thepixel colors are computed by oversampling and following low passfiltering. Additionally gradient correction is applied to reduce vis-ible seams in the completed image. Figure 17 shows a completedimage that would have been impossible without perspective correc-tion.

Interactive image completion with perspective correction suffersfrom the same limitations as all exemplar-based image completionmethods. If there is not sufficient information in the known region,the unknown regions cannot be filled in a meaningful way. Yet dueto the fast computation of the algorithm additional images can beincluded in the fragment search. Figure 20 shows an example whereinformation from two images where used to generate the final im-age.

4 Image Deformation

Image deformation is the process of resizing parts of an image in anon-uniform way. While this can be done by stretching and scal-ing the texture of these deformed features, more advanced methodscombine image deformation with texture synthesis to achieve morenatural looking features.

4.1 Detail Preserving Shape Deformation

Fang et al. [Fang and Hart 2007] propose a shape deformationsystem that allows users to first trace one or more feature curvesand then move them to generate deformations. The system auto-matically replaces all textures stretched by the deformation processwith textures resynthesized from the original image. They achievethis by decoupling the deformation of morphological features frompixel color generation. Their patch based texture synthesis ap-proach distorts the texture coordinates for each patch to align withthe contours implied by the user modified feature curves.

The user first deforms an image I by placing curves pi(t) along thefeature he wants to alter and moves these curves to the desired po-sitions. This results in corresponding feature curves p′i(t) for thedestination image I′. These control curves are Bezier curves con-structed by moving the control points. The curves are then brokeninto disjoint segments and periodically sampled into a set of pixelsfor both source and target curves. From the correspondence be-tween these pixel positions a discretely sampled smooth deforma-tion is constructed by solving an image Laplacian. This ordinaryLaplacian deformation is previewed to the user after moving thecurves. Additionally some feature curves are denoted as passive tomake the initial deformation easier. These curves aid the textureorientation but during movement of curves follow the motion of thedeformation field generated by the master feature curves. The fea-ture curves for the input image shown in Figure 22(a) are shown inFigure 22(b).

To preserve the texture, curvilinear coordinates are grown aroundboth the source and the target curve. Therefore the analytic curvetangents are sampled and stored at the curve sample pixel position

(a) (b) (c) (d)

(e)

Figure 22: Detail Preserving Shape Deformation: (a) The input image. (b) The user defined destination feature curves for the deformationand the deformed image with stretched textures. (c) Curvilinear coordinate patches grown around the source and target feature curves. (d)The finished deformation with preserved texture frequency and orientation.

and similarly diffused across the image. From these diffused tan-gents a local curvilinear coordinate system is constructed at anycurve sample pixel. In screen coordinates, with a unit distance be-tween pixels, the curvilinear coordinates are a 2D array of planarpositions q j,k ∈ R2 with q0,0 placed on a chosen origin pixel. TheEuler integration

q j±1,0 = q j,0±(q j,0), q j,k±1 = q j,k±ε

[0 −11 0

]T (q j,k), (7)

defines a curve of positions parallel and perpendicular to the fea-ture curve. These positions are grown into coordinate patches andsmoothed with several Laplacian iterations. A corresponding curvi-linear coordinate grid is grown for the destination curves as illus-trated in Figure 22(c).

The textures are then synthesised for the destination patch using thecorresponding source patch as a texture swatch. These small syn-thesised patches have to be grown and merged into an even textur-ing over the entire image region surrounding the destination curve.Therefore, after generating the initial patch, subsequent patches aregenerated following the GraphCut [Kwatra et al. ] method as shownin Figure 22(d). A pool of possible texture patches is generatedfrom locations in the source image. Ultimately the patch that bestoverlaps with the patches already created in the destination imageis selected. If the resulting seams between patches proves unsat-isfactory Poisson Image Editing [Perez et al. 2003] can be used toblend them together.

Compressing large source areas into small target areas causes arte-facts due to texture continuity problems. This problem is solved byaltering the texture synthesis sampling so that areas of high defor-mation compression are synthesized to smaller patches as shown inFigure 23.

(a) (b)

(c) (d)

Figure 23: Detail Preserving Shape Deformation: (a) The inputimage with user applied feature curves. (b) The destination featurecurves for the deformation. (c) Block Artefacts due to compressionof patches. (d) The fixed deformation with adaptive patch size.

The success of Detail Preserving Shape Deformation depends heav-

ily on the selection of the feature curves. While often a few obviouscurves are enough sometimes a more exhaustive tracing is neces-sary. Therefore the technique could be improved by adding auto-matic detection and organization of image feature curves. It wouldfurther be beneficial to include matting to separate the object to bedeformed from the background to eliminate possible halo artefacts.

5 Gradient Based Image Editing

Gradient based operations like strengthening edges or removing ashadow boundary can be tedious in traditional intensity based paintprograms. Gradient based operations are typically only found in thefilters menu of paint programs and perform a global operation auto-matically or with minimal user guidance. They normally also forcethe user to wait several seconds for the result while the modifiedimage gradients are reconstructed into a best-fit intensity image.

Furthermore perceptual experiments suggest that the human visualsystem works by measuring local intensity differences rather thanintensity itself. Yet interactive input for image editing software tra-ditionally uses only absolute intensity.

5.1 Real-Time Gradient-Domain Painting

McCann and Pollard [McCann and Pollard 2008] propose a newway of working in the gradient domain featuring real time feed-back on megapixel sized images. They achieve this by introduc-ing a fast, GPU multigrid method that allows integration at over 20frames per second on modest hardware. In their paint program theyoffer a range of brushes coupled with special blend modes similarto the tools found in today’s intensity based paint programs. Thesetools allow users to paint gradients in much the same way they paintwith intensity brushes. Using the various blend modes the paintedgradients are combined with the existing gradients and displayed inreal time by the integrator.

When implementing the brushes for a gradient based paint systemone has to be aware that instead of editing a single intensity im-age, there are two gradient images. One for the x-gradient and onefor the y-gradient. Therefore the brushes are restricted by relyingon both mouse position and stroke direction to generate gradients.Strokes in different directions have different meanings when paint-ing in gradient domain. This issue could be solved by automaticdirection determination or, at least, a direction switch key. Addi-tionally a specific color cannot be selected in an absolute way asedges implicitly specify both a color on one side and its comple-ment on the other. This has to be considered when implementing afeasible color chooser. Finally it is possible to build up large rangedifferences by drawing many parallel strokes. Therefore an auto-matic range-compression/adjustment tool might be desirable.

McCann and Pollard present three different brushes in theirgradient-domain paint tool. The simplest of those is the gradientbrush. It allows users to paint strokes that emit gradients of the cur-rent color and perpendicular to the stroke direction. This allows fordefining volume and shading by defining edges without touchingthe interior of shapes.

As the gradient brush does not allow for the more subtle texturesand impact of natural edges found in real images there is the edgebrush. This brush allows the user to first select a desired edge in theimage by painting a stroke along a segment of that edge. The gradi-ents around this marked edge are captured. When painting with theedge brush these gradients are transformed to the local coordinatesystem of the current stroke and emitted. To allow for long strokes

exceeding the size of the captured edge the edge playback is loopedand re-oriented. Examples of the results can be seen in Figure 24,where the roofline was extended using the edge brush. The addi-tional cracks and colored stripes where also added with the edgebrush in Figure 25.

With the clone brush Gradient Domain Painting treads on the fieldof image compositing, a classical and effective gradient domain ap-plication. The clone brush copies gradients relative to a source lo-cation onto a destination stroke. Because of the global integratorusers can drag entire cloned regions. This allows for copying anobject as well as its lighting effects. Figure 27 shows an exampleof the clone brush, which was used to copy a window.

The various blending modes allow the user to change how the newlycreated gradients combine with the background. Blending is aper pixel operation and is applied across all color channels. Forthe background gradient (gx,gy) and the current gradient from thebrush (bx,by) the equations for the different blend modes are thefollowing.

Additive blending:

(gx,gy) = (bx,by)+(gx,gy) (8)

This blend mode sums the gradients of the brush and the back-ground. This allows for building up lines over time and for colorand shadow adjustment as well as building up texture over multiplecloning passes.

Maximum blending:

(gx,gy) =

(gx,gy) if |(gx,gy)|> |(bx,by)|(bx,by) otherwise (9)

This mode selects the larger of the brush and background gradi-ents and is used when cloning or copying edges by providing a sortof automatic matting. Minimum blending works analogous and isneeded when cloning smooth areas over noisy ones.

Over blending:(gx,gy) = (bx,by) (10)

This mode simply replaces the background with the brush to erasetexture with the gradient brush or when cloning.

Directional blending:

(gx,gy) = (gx,gy)∗(

1+bx ·gx +by ·gy

gx ·gx +gy ·gy

)(11)

Directional blending is a novel blending mode that enhances back-ground gradients that point in the same direction as the brush gra-dients and suppresses gradients that point in the opposite direction.This mode is useful for lighting and contrast enhancements.

The core of real time gradient domain painting is the multigrid inte-grator which allows real time feedback to the user. To get the imageu with gradients close, in the least-squares sense, to the gradient im-ages Gx, Gy one has to solve the Poisson equation:

∇2u = f (12)

McCann and Pollard solve this equation iteratively using the multi-grid method. One iteration, called VCycle of the multigrid solver,which is illustrated in Figure 26, estimates the solution to the lin-ear system Lhuh = fh by recursively estimating the solution to acoarser version of the system L2hu2h = f2h. Where h denotes thespacing of the grid points. The solution is then refined using twoJacobi iterations. Operators P , R are used to effect these changes

(a) (b)

Figure 24: Gradient-Domain Painting: (a) The input image. (b) Edited image using the edge brush and the clone brush and various blendmodes.

(a) (b)

Figure 25: Gradient-Domain Painting: (a) The input image. (b) Edited image using the edge brush and additive blending.

Figure 26: Gradient-Domain Painting: A single call to VCycle.Arrows show data flow.

in grid resolution. R takes a finer fh to a coarser f2h via convolu-tion, then subsamples. While P expands a coarse u2h to a finer uhby performing bilinear interpolation.

The implementation is GPU based, storing all data matrices as 16-bit floating point textures and integrating the color channels in par-allel. One VCycle is run every frame to provide interactive and

persistent performance.

The gradient domain painting comes at the cost of increased mem-ory requirements compared to conventional image editors. Forgood performance this memory must also be GPU-local. Howevertechniques of tiling large images into smaller pieces, and only load-ing those tiles involved of the current operation into memory couldwork with the integrator as all memory operations in the algorithmare local. Alternatively McCann and Pollard consider adapting thestreaming multigrid presented by Kazhan et al. for their purposes.

Gradient domain painting is certainly an exciting prospect for im-age editing and has great potential to be included in future commer-cial applications.

6 Content Aware Resizing

Effective image resizing has become more important in recent yearswith the desire to display images on devices with limited screensize and bandwidth like mobile phones, PDAs and Hand Held PCs.Moreover when working with HTML or other standards it is pos-sible to dynamically change the page layout and text while imagesremain static and fixed. Therefore it is needed to resize images in away that preserves the salient regions of the image.

Unfortunately classic resizing techniques like cropping an scaling

Figure 27: Gradient-Domain Painting: Example of a cloned objectwith the clone brush and improved with the gradient brush. (Origi-nal image at the top, modified image at the bottom)

are often insufficient to accomplish these requirements. Image scal-ing uniformly resizes an image without adding or removing parts ofthe image. But when decreasing image size, key regions might be-come too small and when the aspect ratio is modified the imagebecomes distorted.

Cropping is limited by only removing pixels from the image pe-riphery and the difficulty to automate the process. The solution tothe problem seems to be adapting the image to a new size whileemphasizing the regions of interest and keeping the context intact.

6.1 Cropping and Retargeting

There are several approaches to the resizing problem focused on ex-tracting key regions of interest (ROI) based on saliency maps or facedetection. When knowing the ROI it is possible to crop only theseregions, Suh et al. [Suh et al. 2003]. Liu and Gleicher [Liu andGleicher 2005] use a non-linear fisheye-view warp to emphasizeROIs while shrinking the rest of the image. But these two methodshave problems if there is more than one main ROI. Another methodproposed by Setlur et al. [Setlur et al. 2005] first segments the im-age into regions and identifies the ROIs. If all the regions detectedwould fit inside the desired image size, cropping is used. Otherwisethe ROIs are cut from the background which is restored using im-age completion techniques to fill the holes. This new image and theROIs are resized using standard scaling and pasted back together toform the final image.

All these methods achieve workable results but rely on traditionalimage resizing and cropping operations to actually change the sizeof the image. They are also primarily useful to decrease the size ofthe image to generate meaningful representations of the image forsmall displays or thumbnails.

6.2 Image Retargeting Using Seam Carving

The initial idea of seam carving is quite simple. When reducing thesize of an image, find unnoticeable pixels that blend with their sur-roundings and remove those first. The most straight forward waywould be to remove the pixels with the lowest energy in ascendingorder. But such an approach would destroy the rectangular shapeof the image as low energy pixels are most likely not evenly dis-tributed. To preserve the shape an equal amount of pixels have to beremoved in each row/column. However selecting the pixels of eachrow only based on the energy function would destroy the imagecontent by creating a zigzag effect. By removing whole columnswith the lowest energy the results get better but still artefacts mightappear.

The strategy of seam carving as proposed by Avidan and Shamir[Avidan and Shamir 2007], preserves the image shape and con-tent and is less restrictive than cropping or column removal byfinding and removing seams of low energy pixels. A seam is an8-connected paths of low energy pixels that run either verticallyor horizontally through the image containing one pixel in eachrow/column of the picture. Both a vertical and a horizontal seamare marked in the example image in Figure 29. The result of seamcarving shown in this example is superior to the scaled image.

If I is an n×m image, the definition of a vertical seam sx would be

sx = sxi n

i=1, s.t. ∀i, |x(i)− x(i−1)≤ 1, (13)

where x is a mapping x : [1, ...,n]→ [1, ...,m]. The horizontal seamsy is similar for y : [1, ...,m]→ [1, ...,n].

It is important that each seam only contains a single pixel in eachrow/column of the image. So removing a seam has only a localeffect similar to the removal of a whole row or column. All pixelsof the image are shifted left (or up) to compensate for the missingpath.

Original e1

eHoG

Figure 28: Seam Carving: Comparing the different energy func-tions)

To determine the energy of the pixels and subsequently the cost ofa seam there are several possible methods to measure image impor-tance. Of the tested energy functions either e1 or eHoG work quitewell, but no single energy function performs well across all im-ages. The two energy functions and their slightly different resultsare shown in Figure 28.

Figure 29: Seam Carving: The original image on the left with a horizontal and a vertical seam in red. In the middle the energy function. Theretargeted image is shown on the top left and as a comparison the scaled image on the bottom left.

e1(I) = | ∂

∂xI|+ | ∂

∂yI| (14)

eHoG(I) =| ∂

∂x I|+ | ∂

∂y I|max(HoG(I(x,y))

(15)

,where HoG(I(x,y)) is a histogram of oriented gradients at everypixel.

Given an energy function e the cost of a seam is defined as

E(s) = E(Is) =n

∑i=1

e(I(si)) (16)

and the cost for the optimal seam s∗ is minimized using dynamicprogramming. For vertical seams the image is traversed from thesecond row to the last row and the cumulative minimum energy Mfor all possible connected seams is computed for each entry (i, j).

The minimum value in the last row in M represents the end of theminimal connected vertical seam. The process is than backtrackedfrom this entry to find the path of the optimal seam. The definitionfor horizontal seams is similar.

Once the seams are found, reducing the size for a given image I inone dimension can simply be achieved by successively removingseams.

But for general aspect ratio changes where the image size ischanged in both horizontal and vertical direction the correct orderof seam removal has to be found. It is possible to either removevertical or horizontal seams first or to alternate between vertical andhorizontal seam removal. Still for the optimal order the strategy isto optimize the function:

minsx,sy,α

k

∑i=1

E(αisxi +(1−αi)s

yi ) (17)

where k = r+c,r = (m−m′),c = (n−n′) and αi is a parameter thatdetermines if a horizontal or vertical seam was removed at step i.

A transport map T is created that specifies for the desired targetimage size the cost of the optimal sequence of vertical and hori-zontal seam removal operations. For a desired image size n′×m′where n′ = n− r and m′ = m−c the entry T (r,c) holds the minimalcost needed. Using dynamic programming T is computed starting

at T (0,0) = 0 and choosing for each entry (r,c) the best of the twooptions using the equation:

T (r,c) = min(T (r−1,c)+E(sx(In−r−1×m−c))),

T (r,c−1)+E(sy(In−r−1×m−c−1)) (18)

where In−r×m−c denotes an image of size n− r×m− c, E(sx(I))and E(sy(I)) are the cost of the respective seam removal opera-tions. The chosen operations are then stored in a 1-bit map. Thenbacktracked from T (r,m) to T (0,0) and the corresponding removaloperation is applied.

So far all changes in image size were reductions because enlarg-ing images using seam carving needs special attention, as shownin Figure 30. For image enlarging new artificial seams are insertedinto the image. The optimal seam is duplicated by averaging pixelswith their left and right neighbours for vertical seams or with theirtop and bottom neighbours for the horizontal case. But repeatingthis process would always choose the same seam creating seriousstretching artefacts. To avoid that, the process of seam removalis seen as a time-evolution process where a smaller image I(t) iscreated after removing t seams from I. Image enlarging could beseen as the reversal of this time evolution and the resulting imageis denoted I(−1). Therefore, to enlarge an image by k the k seamswith the lowest energy that would be removed have to be foundand duplicated. Otherwise only the seam with the lowest energy isduplicated multiple times. For image enlarging greater than 50%of the image the process has to be executed in several steps whereeach step only enlarges the image by a fraction of its size from theprevious step. Still excessive scaling is likely to produce noticeableartefacts.

It is possible to combine seam carving with Poisson reconstructingto reduce visible artefacts in the resized images. After comput-ing the energy function, the seam removal operation is applied tothe gradient of the original image. At the end, the image is recon-structed using a poisson solver.

Seam carving can also be used for content amplification. This canbe archived by first scaling the image and then applying seam carv-ing to reduce the image to its original size.

Another interesting application of seam carving is for object re-moval, as illustrated in Figure 31. The user marks regions he wantsto remove which sets their energy level is to zero. It is also possibleto specify regions that the user wants to preserve by setting theirenergy level to infinite. Seams are removed until all marked pixelsare gone. This obviously reduces the size of the image. To regain

(a)

(b)

(c)

(d)

Figure 30: Seam Carving: (a) The Original Image to be enlarged.(b) Choosing the optimal seam duplicates only one seam multipletimes. (c) The k seams that would be removed for size reduction.(d) The retargeted image.

the original size seam insertion has to be employed. Compared toother object removal techniques this alters the whole image eitherin size or its content if it is restored to its initial size.

Seam carving also proposes a method of storing multi-size images.Such images are able to change in size according to the need of theuser, like an image on a web page that is resized automatically de-pending on the resolution the site is viewed in. While seam carvingis linear in the number of pixels and therefore seams to be removedor inserted, computing tens or hundred of seams in real time wouldnot be a feasible plan. Multi sized images are defined by encodingfor each pixel the index of the seam that removed it in an indexmap. For example V (i, j) = t means that pixel (i, j) was removedby the t-th seam removal. To get an image of width m′, in eachrow, all pixels with seam index greater than or equal to m−m′ aregathered. Furthermore to support image enlarging new pixels areinserted as the average of the k-th seam and its left or right pixelneighbours. These seams are given a negative index starting at -1.To enlarge an image by k all pixels whose seam index is greaterthan (m− (m+ k)) =−k are gathered.

While the index maps for horizontal and vertical seams, as shownin Figure 32, are computed in a similar manner, supporting both

(a) (b) (c)

Figure 31: Seam Carving: (a) Removing an object from the Orig-inal Image. (b) Marked Regions, red to preserve and green to re-move. (c) The retargeted image with the removed girl.

(a) (b) (c)

Figure 32: Seam Carving: (a) The Original Image. (b) The hor-izontal index map H. (c) The vertical index map V . Colored bytheir index from blue (remove fist) to red (remove last)

dimension resizing with independently computed index maps willnot work. The reason is that horizontal and vertical seams mightshare more than one pixel and removing a seam in one directionmight destroy the index map of the other. One possible solutionwould be to use seam removal in one direction while using rows orcolumns in the other. As a result the respective multi-size imagewould differ from one created by the optimal order of seams.

While seam carving shows impressive results it is not without limi-tations. In some cases using a face detector in addition to the energyfunction can improve result. Still seam carving fails if the imagehas no unimportant areas. If the content is too condensed no seamscan be found that bypass the important regions leading to deformedobjects. In such cases standard scaling would be more feasible.The same problem appears if content spans the entire image. Againseams cannot bypass the object leading to distortion in the object.

7 Conclusion

This report featured some new techniques that might become anintegral part of commercial image editing software in the future.Applications like soft scissors and gradient domain painting showhow efficient algorithms can improve former “apply and wait” ef-fects and give direct control to the user, making them interactive andintuitive to use. Other approaches like patch transform, show greatpromise but have yet to be optimized for end user applications. Ad-vanced image completion techniques like image completion withperspective correction are a powerful alternative to the clone brush.

In the past Adobe has displayed great foresight as to what new fea-tures are wanted by both the professional artist and home user alike.

They have also shown good instinct in hiring talented people in-volved with the features they want to add. For example Adobe hiredAriel Shamir, the co-developer of Seam carving for Content AwareResizing, in 2007. The technique already made it into Photoshop’srecent CS4 release giving users access to this powerful new methodof image resizing.

References

ADOBE, 2008. Photoshop. http://www.adobe.com/.

AVIDAN, S., AND SHAMIR, A. 2007. Seam carving for content-aware image resizing. ACM Transactions on Graphics 26, 3.

CHO, T. S., BUTMAN, M., AVIDAN, S., AND FREEMAN, W. T.2008. The patch transform and its applications to image edit-ing. In Proceedings of IEEE Conference on Computer Visionand Pattern Recognition.

CHUANG, Y.-Y., CURLESS, B., SALESIN, D. H., AND SZELISKI,R. 2001. A bayesian approach to digital matting. In Proceedingsof IEEE Conference on Computer Vision and Pattern Recogni-tion.

COREL, 2008. Paint shop pro. http://www.corel.com/.

FANG, H., AND HART, J. C. 2007. Detail preserving shape de-formation in image editing. ACM Transactions on Graphics 26,3.

GRADY, L. 2006. Random walks for image segmentation. IEEETransactions on Pattern Analysis and Machine Intelligence 28,11.

JIA, J., SUN, J., TANG, C.-K., AND SHUM, H.-Y. 2006. Drag-and-drop pasting. ACM Transactions on Graphics 25, 3.

KOMODAKIS, N., AND TZIRITAS, G., 2007. Image completionusing efficient belief propagation via priority scheduling and dy-namic pruning.

KWATRA, V., SCHDL, A., ESSA, I., TURK, G., AND AARON BO-BICK2, T. .

LEVIN, A., LISCHINSKI, D., AND WEISS, Y. 2008. A closed-form solution to natural image matting. IEEE Transactions onPattern Analysis and Machine Intelligence . 30, 2.

LIU, F., AND GLEICHER, M. 2005. Automatic image retargetingwith fisheye-view warping. In Proceedings of ACM symposiumon User Interface Software and Technology.

MCCANN, J., AND POLLARD, N. S. 2008. Real-time gradient-domain painting. ACM Transactions on Graphics 27, 3.

PAVIC;, D., SCHOENEFELD, V., AND KOBBELT, L. 2006. Inter-active image completion with perspective correction. The VisualComputer 22, 9.

PEREZ, P., GANGNET, M., AND BLAKE, A. 2003. Poisson imageediting. ACM Transactions on Graphics 22, 3.

ROTH, S., AND BLACK, M. J. 2005. Fields of experts: A frame-work for learning image priors. In Proceedings of IEEE Confer-ence on Computer Vision and Pattern Recognition.

ROTHER, C., KOLMOGOROV, V., AND BLAKE, A. 2004. Grabcut:interactive foreground extraction using iterated graph cuts. ACMTransactions on Graphics 23, 3.

SETLUR, V., TAKAGI, S., RASKAR, R., GLEICHER, M., ANDGOOCH, B. 2005. Automatic image retargeting. In Proceed-ings of the International Conference on Mobile and UbiquitousMultimedia.

SUH, B., LING, H., BEDERSON, B. B., AND JACOBS, D. W.2003. Automatic thumbnail cropping and its effectiveness. InProceedings of ACM symposium on User Interface Software andTechnology.

SUN, J., JIA, J., TANG, C.-K., AND SHUM, H.-Y. 2004. Poissonmatting. ACM Transactions on Graphics 23, 3.

SUN, J., YUAN, L., JIA, J., AND SHUM, H.-Y. 2005. Imagecompletion with structure propagation. In Proceedings of ACMSIGGRAPH 2005.

TEAM, T. G. D., 2008. Gnu image manipulation program, gimp.http://www.gimp.org/.

WANG, J., AND COHEN, M. F. 2005. An iterative optimizationapproach for unified image segmentation and matting. In Pro-ceedings of IEEE Conference on Computer Vision and PatternRecognition.

WANG, J., AND COHEN, M. F. 2007. Optimized color samplingfor robust matting. In In Proceedings of IEEE Conference onComputer Vision and Pattern Recognition.

WANG, J., AGRAWALA, M., AND COHEN, M. F. 2007. Softscissors: An interactive tool for realtime high quality matting.ACM Transactions on Graphics 26, 3.

WEISS, Y., AND FREEMAN, W. T. 2007. What makes a goodmodel of natural images? In Proceedings of IEEE Conferenceon Computer Vision and Pattern Recognition.

YEDIDIA, J. S., FREEMAN, W. T., AND WEISS, Y. 2003. Under-standing belief propagation and its generalizations.

Download - Tomorrow’s Photoshop E ects · Image retargeting using Seam Carving is an image resizing technique based on the ... ﬁnally established Photoshop’s success with the introduction

Top Related