image analogies

Image AnalogiesImage Analogies

Aaron Hertzmann Aaron Hertzmann (1,2)(1,2)Charles E. Jacobs Charles E. Jacobs (2)(2)

Nuria Oliver Nuria Oliver (2)(2)Brian Curless Brian Curless (3)(3)

David H. Salesin David H. Salesin (2,3)(2,3) 11 New York UniversityNew York University2 Microsoft Research2 Microsoft Research

3 University of Washington3 University of Washington

IntroductionIntroduction

A.nal.o.gyA.nal.o.gy A systematic comparison between structures that uA systematic comparison between structures that u

ses properties of and relations between objects of ses properties of and relations between objects of a source structure to infer properties of and relatioa source structure to infer properties of and relations between objects of a target structurens between objects of a target structure

Given a pair of images A and A’ (the unfiltered and Given a pair of images A and A’ (the unfiltered and filtered source images, respectively), along with sofiltered source images, respectively), along with some additional unfiltered target image B, synthesize me additional unfiltered target image B, synthesize a new filtered target image B’ such that A : A’ :: B a new filtered target image B’ such that A : A’ :: B : B’: B’


We use an autoregression algorithm, based priWe use an autoregression algorithm, based primarily on recent work in texture synthesis by marily on recent work in texture synthesis by Wei and Levoy and Ashikhmin.Wei and Levoy and Ashikhmin.

Indeed, our approach can be thought of as a cIndeed, our approach can be thought of as a combination of these two approaches, along wiombination of these two approaches, along with a generalization to the situation of correspoth a generalization to the situation of corresponding pairs of images, rather than single texturnding pairs of images, rather than single textures.es.

In order to allow statistics from an image A to In order to allow statistics from an image A to be applied to an image B with complete differebe applied to an image B with complete different colors, we sometimes operate in a preprocent colors, we sometimes operate in a preprocessed luminance space.ssed luminance space.


Applications:Applications: Traditional texture synthesisTraditional texture synthesis Improved texture synthesisImproved texture synthesis Super-resolutionSuper-resolution Texture transferTexture transfer Artistic filterArtistic filter Texture-by-numbersTexture-by-numbers

Related workRelated work

Machine learning for graphicsMachine learning for graphics Texture synthesisTexture synthesis Non-photorealistic renderingNon-photorealistic rendering Example-based NPRExample-based NPR

AlgorithmAlgorithm

As input, our algorithm takes a set of three As input, our algorithm takes a set of three images, the unfiltered source image A, the images, the unfiltered source image A, the filtered source image A’, and the unfiltered filtered source image A’, and the unfiltered target image B.target image B.

It produces the filtered target image B’ as It produces the filtered target image B’ as output.output.

Our approach assumes that the two source Our approach assumes that the two source images are registeredimages are registered The colors at and around any given pixel The colors at and around any given pixel pp in A in A

correspond to the colors at and around that correspond to the colors at and around that same pixel same pixel pp in A’ in A’

We are trying to learn the image filter We are trying to learn the image filter

AlgorithmAlgorithm

We use A(We use A(pp) (or A’() (or A’(pp)) to denote the complete featu)) to denote the complete feature vector of A(re vector of A(pp) (or A’() (or A’(pp)) at pixel )) at pixel pp

Similarly, We use B(Similarly, We use B(qq) (or B’() (or B’(qq)) to denote the comp)) to denote the complete feature vector of B(lete feature vector of B(qq) (or B’() (or B’(qq)) at pixel )) at pixel qq

We will need to keep track of the position p of the soWe will need to keep track of the position p of the source pixel that was copied to pixel q of the target, we urce pixel that was copied to pixel q of the target, we will store this addition data structure s(.) will store this addition data structure s(.)

For example, s(For example, s(qq)=)=pp We will actually use a multiscale representation of alWe will actually use a multiscale representation of al

l five of these quantities in our algorithm, we typicalll five of these quantities in our algorithm, we typically index each of these arrays by their multiscale level y index each of these arrays by their multiscale level l using subscriptsl using subscripts

For example, presents the source image A at a given resolFor example, presents the source image A at a given resolution, then represents a corresponding lower-resolution iution, then represents a corresponding lower-resolution image at the next coarser levelmage at the next coarser level

lA

1-A l

AlgorithmAlgorithm

First, in an initialization phase, multiscale (Guassian pyrFirst, in an initialization phase, multiscale (Guassian pyramid) representations of A, A’, and B is constructed, alamid) representations of A, A’, and B is constructed, along with their feature vectors and some additional indicong with their feature vectors and some additional indices used for speeding the matching process.es used for speeding the matching process.

We use L to denote the maximum level (the highest- resWe use L to denote the maximum level (the highest- resolution)olution)

At each level At each level ll, statistics pertaining to each pixel , statistics pertaining to each pixel qq in the in the target pair are compared against statistics for every pixtarget pair are compared against statistics for every pixel el pp in the source pair, and the “best” match is found. in the source pair, and the “best” match is found.

The pixel that matched best is recorded in The pixel that matched best is recorded in )(s ql

AlgorithmAlgorithm

AlgorithmAlgorithm

The heart of the image analogies algorithm is tThe heart of the image analogies algorithm is the BestMatch subroutinehe BestMatch subroutine

The routine finds the pixel The routine finds the pixel pp in the source pair in the source pair that best matches the pixel being synthesized, that best matches the pixel being synthesized, using two different approaches: using two different approaches: an approximate search, which attempts to efficientan approximate search, which attempts to efficient

ly find the closest-matching pixel according to the fly find the closest-matching pixel according to the feature vectors of eature vectors of p,qp,q, and their neighborhood; , and their neighborhood;

a coherence search, based on Ashikhmin’s approa coherence search, based on Ashikhmin’s approach ,which attempts to preserve coherence with thach ,which attempts to preserve coherence with the neighboring synthesized pixels e neighboring synthesized pixels

AlgorithmAlgorithm

AlgorithmAlgorithm

Since L2-norm is imperfect measure of perceptual Since L2-norm is imperfect measure of perceptual similarity, coherent pixels will often look better similarity, coherent pixels will often look better than the best match under L2than the best match under L2

Thus, the larger the value of , the more Thus, the larger the value of , the more coherence is favored over accuracy in the coherence is favored over accuracy in the synthesized imagesynthesized image

In order to keep the coherence term consistent at In order to keep the coherence term consistent at different scales, we attenuate it by a factor of different scales, we attenuate it by a factor of since pixel locations at coarser scales are spaced since pixel locations at coarser scales are spaced further apart than at finer scales.further apart than at finer scales.

We typically use for color non-We typically use for color non-photorealistic filters, for line art filters, and photorealistic filters, for line art filters, and for texture synthesis. for texture synthesis.

252 55.0 1

Ll 2

AlgorithmAlgorithm

We use to denote the concatenation of all the feWe use to denote the concatenation of all the feature vectors within some neighborhood N(ature vectors within some neighborhood N(pp) of bot) of both source image A and A’ at both the current resoluth source image A and A’ at both the current resolution level l and at the coarser resolution level l-1.ion level l and at the coarser resolution level l-1.

The norm is computed as a weighted disThe norm is computed as a weighted distance over the feature vectors F(tance over the feature vectors F(pp) and F() and F(qq), using a ), using a Gaussian kernel, so that differences in the feature veGaussian kernel, so that differences in the feature vectors of pixels further from ctors of pixels further from pp and and qq have a smaller w have a smaller weight relative to the differences at eight relative to the differences at pp and and qq

)( pFl

2||)()(|| qFpF ll

AlgorithmAlgorithm

For the BestApproximateMatch procedure, we have tried using bFor the BestApproximateMatch procedure, we have tried using both approximate-nearest-neighbor search (ANN) and tree structuoth approximate-nearest-neighbor search (ANN) and tree structured vector quantization (TSVQ), using the same norm over the fered vector quantization (TSVQ), using the same norm over the feature vectors.ature vectors.

The BestCoherenceMatch procedure simply returns s(r*)+(q-r*), The BestCoherenceMatch procedure simply returns s(r*)+(q-r*), wherewhere

and N(and N(qq) is the neighborhood of already synthesized pixels adjac) is the neighborhood of already synthesized pixels adjacent to ent to qq in . in .This formula essentially returns the best pixel that is coherent wiThis formula essentially returns the best pixel that is coherent with some already-synthesized portion of adjacent to th some already-synthesized portion of adjacent to qq, which is , which is the key insight of Ashikhmi’s methodthe key insight of Ashikhmi’s method

2

)(||)())()((||minarg qFrqrsFr ll

qNr

'lB

'lB

AlgorithmAlgorithm

AlgorithmAlgorithm

However, for some filters, we found that our source pairs did not contHowever, for some filters, we found that our source pairs did not contain enough data to match the target pair well using RGB color.ain enough data to match the target pair well using RGB color.

An alternative, which we have used to generate many of the results shAn alternative, which we have used to generate many of the results shown in this paper, is to compute and store the luminance at each pixel own in this paper, is to compute and store the luminance at each pixel and use it in place of RGB in the distance metric.and use it in place of RGB in the distance metric.

Luminance can be computed in a number of ways; we use the Y channLuminance can be computed in a number of ways; we use the Y channel from the YIQ color space, where I and Q channels are “color differeel from the YIQ color space, where I and Q channels are “color difference” components.nce” components.

After processing in luminance space, we can recover the color simply After processing in luminance space, we can recover the color simply by copying the I and Q channels of the input B image into the synthesiby copying the I and Q channels of the input B image into the synthesized B’ image, followed by a conversion back to RGB.zed B’ image, followed by a conversion back to RGB.

Our approach is to apply a linear map that matches the means and vaOur approach is to apply a linear map that matches the means and variances of the luminance distributions :riances of the luminance distributions :

where and the mean luminances, and and are the standawhere and the mean luminances, and and are the standard deviations of the luminances, both taken with respect to luminance rd deviations of the luminances, both taken with respect to luminance distributions in A and Bdistributions in A and B

BAA

B pYpY

))(()(

BABA

ApplicationApplication

Traditional image filtersTraditional image filters


Improved texture synthesisImproved texture synthesis The algorithm we have described, when useThe algorithm we have described, when use

d in this way for texture synthesis, combines d in this way for texture synthesis, combines the advantages of the weighted L2 norm anthe advantages of the weighted L2 norm and Ashikhmin’s algorithm.d Ashikhmin’s algorithm.

For example, the synthesized textures showFor example, the synthesized textures shown in next figure have a similar high quality to n in next figure have a similar high quality to those of Ashikhmin’s algorithm, without ththose of Ashikhmin’s algorithm, without the edge discontinuities.e edge discontinuities.

Improved texture synthesisImproved texture synthesis


Super-resolutionSuper-resolution Image analogies can be used to effectively Image analogies can be used to effectively

“hallucinate” more detail in low-resolution “hallucinate” more detail in low-resolution images, given some low-and high-resolution images, given some low-and high-resolution pairs (used as A and A’) for small portions of pairs (used as A and A’) for small portions of the image.the image.

Training data is used to specify a “super-Training data is used to specify a “super-resolution” filter that is applied to a blurred resolution” filter that is applied to a blurred version of the full image to recover an version of the full image to recover an approximation to the higher-resolution original.approximation to the higher-resolution original.

Super-resolutionSuper-resolution


Texture transferTexture transfer We filter an image B so that it has the texture We filter an image B so that it has the texture

of a given example texture A’.of a given example texture A’. We can trade off the appearance between that We can trade off the appearance between that

of the unfiltered image B and that of the of the unfiltered image B and that of the texture by introducing a weight into the texture by introducing a weight into the distance metric that emphasizes similarity of distance metric that emphasizes similarity of the (A,B) pair over that of the (A’,B’)the (A,B) pair over that of the (A’,B’)

For better results, we also modify the For better results, we also modify the neighborhood matching by using single-scale neighborhood matching by using single-scale 1*1 neighborhoods in the A and B images.1*1 neighborhoods in the A and B images.

Texture transferTexture transfer


Artistic filtersArtistic filters For the color artistic filters in this paper, we perforFor the color artistic filters in this paper, we perfor

med synthesis in luminance space, using the preprmed synthesis in luminance space, using the preprocessing described.ocessing described.

For line art filters, using steerable filter responses iFor line art filters, using steerable filter responses in feature vectors leads to significant improvement.n feature vectors leads to significant improvement.

We suspect that this is because line art depends sigWe suspect that this is because line art depends significantly on gradient directions in the input imagenificantly on gradient directions in the input images.s.


Texture-by-numbersTexture-by-numbers It allows new imagery to be synthesized by It allows new imagery to be synthesized by

applying the statistics of a labeled example applying the statistics of a labeled example image to a new labeling image B.image to a new labeling image B.

A major advantage of texture-by-numbers A major advantage of texture-by-numbers is that it allows us to synthesize from is that it allows us to synthesize from images for which ordinary texture images for which ordinary texture synthesis would produce poor resultssynthesis would produce poor results

The EndThe End

http://http://grail.cs.washington.edugrail.cs.washington.edu/projects/image-analogies/projects/image-analogies

http://grail.cs.washington.edu/projects/image-analogies



image analogies

Documents

source pixel

unfiltered source image

new filtered target

source pair

pixel psimilarly

pixel qwe

given pixel p

image analogies algorithm