detecting doctored images using camera response normality

6
Detecting Doctored Images Using Camera Response Normality and Consistency Zhouchen Lint Rongrong * Xiaoou Heung-Yeung [email protected] [email protected] [email protected] Microsoft Research Asia, Sigma Building, Zhichun Road 100080, P.R. China Fudan University, Road Shanghai 200433, P.R. China Abstract The advance in editing techniques has facil- itated people in synthesizing realistic imageshideos that may hard to be distinguished from real ones by visual ex- amination. This poses a problem: how to differentiate real imageshideosfrom doctored ones? This is a serious prob- lem because some legal issues may occur is no reli- able way for doctored detection when human inspection fails. Digital watermarking cannot solve this problem completely. Wepropose an approach that computes the response functions of the camera by selecting appropri- ate patches in different ways. An image may be doctored if the are abnormal or inconsistent to each The normality of the response functions is by a trained support vector machine (SVM). Experiments show that our method is effective for high-contrast images with many textureless edges. 1 Introduction Recently, numerous editing techniques to name just a few) have been developed so that realistic synthetic can be produced conve- niently. With skillful human interaction, many synthesized are difficult to be distinguished from real ones even by close visual examination. While greatly en- riching user experience and reducing production cost, real- istic synthetic imageshideos may also cause problems. Do they reflect the real situations? Do they convey correct in- formation? If they are misused, people may be misled, or be cheated the B. event or even be troubled by the rumor incurred by the synthesized Therefore, there should also be technologies to determine whether an is synthesized. Human examina- tion, although powerful, may not suffice. At a first glance, may be the solution. However, it is not the complete solution. First, doctored detection is different from digital right man- agement in which is used. The former aims was a visiting student to Microsoft Research Asia when this paper was written. at telling whether an is real or not, where every component can belong to the same owner. While the latter aims at telling whether an belongs to an owner, where the can still be synthesized. Second, as commodity cameras do not supply the func- tion of injecting watermarks as soon as the imageshideos are captured, people may find it inconvenient to protect their photos by injecting watermarks on computers. Con- sequently, there are huge amount of imageshideos without watermarks. Third, whether watermark can sustain heavy editing that is beyond simple diffusion is applied in and alpha-matting [2, is common in synthesis) is still uncertain. As a result, we do not favor watermarking-based approaches. Doctored imageshideos can be detected in several lev- els. At the highest level, one may analyze what are in- side the the relationship between the objects, etc.. Even very advanced information may be used, such as George Washington cannot take photos with George Bush, and human cannot walk outside a space shuttle without spe- cial protection. At the middle level, one may check the im- age consistency, such as consistency in object sizes, color temperature, shading,shadow,occlusion, and sharpness. At the low level, local features may be extracted for analysis, such as the quality of edge fusion, noise level, and water- mark. Human is very good at high level and middle level analysis and has some ability in low level analysis. In con- trast, at least in recent years, computers stillhave difficulties in high level analysis. Nevertheless, computers can still be helpful in middle level and low level analysis, as comple- ment to human examination when the visual cues are miss- ing. et al. have done some pioneering work on this problem. They basically test the statistics of the images They test the interpolation relationship among the nearby pixels if resampling happens when synthesis, the double quantization effect of compression after the images are synthesized, the gamma consistency via blind gamma estimation using the bicoherence, and the signal to noise ratio (SNR) consistency. And most recently, they have also proposed checking the Color Filter Array (CFA) inter- polation relationship among the nearby pixels Their Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) 1063-6919/05 $20.00 © 2005 IEEE

Upload: others

Post on 12-May-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Detecting Doctored Images Using Camera Response Normality

Detecting Doctored Images Using Camera Response Normality and Consistency

Zhouchen Lint Rongrong * Xiaoou [email protected] [email protected] [email protected]

Microsoft Research Asia, Sigma Building, Zhichun Road 100080,P.R. ChinaFudan University, Road Shanghai 200433, P.R. China

AbstractThe advance in editing techniques has facil-itated people in synthesizing realistic imageshideos thatmay hard to be distinguished from real ones by visual ex-amination. This poses a problem: how to differentiate real imageshideosfrom doctored ones? This is a serious prob-lem because some legal issues may occur is no reli-able way for doctored detection when humaninspection fails. Digital watermarking cannot solve thisproblem completely. Wepropose an approach that computesthe response functions of the camera by selecting appropri-ate patches in different ways. An image may be doctored ifthe are abnormal or inconsistent to each

The normality of the response functions isby a trained support vector machine (SVM). Experimentsshow that our method is effective for high-contrast images with many textureless edges.

1 IntroductionRecently, numerous editing techniques

to name just a few) have been developed so thatrealistic synthetic can be produced conve- niently. With skillful human interaction,many synthesized

are difficult to be distinguished from real ones even by close visual examination. While greatly en-riching user experience and reducing production cost, real- istic synthetic imageshideos may also cause problems. Dothey reflect the real situations? Do they convey correct in-formation? If they are misused, people may be misled, or becheated the B. event or even be troubledby the rumor incurred by the synthesized Therefore, there should also be technologies to determine whether an is synthesized. Human examina- tion, although powerful, may not suffice.

At a first glance, may be the solution.However, it is not the complete solution. First, doctored

detection is different from digital right man- agement in which is used. The former aims

was a visiting student to Microsoft Research Asia when this paperwas written.

at telling whether an is real or not, where everycomponent can belong to the same owner. While the latteraims at telling whether an belongs to an owner,where the can still be synthesized. Second, ascommodity cameras do not supply the func- tion of injecting watermarks as soon as the imageshideosare captured, people may find it inconvenient to protecttheir photos by injecting watermarks on computers. Con- sequently, there are huge amount of imageshideos withoutwatermarks. Third, whether watermark can sustain heavyediting that is beyond simple diffusion is applied in and alpha-matting [2, is common in

synthesis) is still uncertain. As a result, we donot favor watermarking-based approaches.

Doctored imageshideos can be detected in several lev- els. At the highest level, one may analyze what are in-side the the relationship between the objects, etc.. Even very advanced information may be used, such asGeorge Washington cannot take photos with George Bush, and human cannot walk outside a space shuttlewithout spe-cial protection. At the middle level, one may check the im-age consistency, such as consistency in object sizes, color temperature, shading,shadow,occlusion, and sharpness. Atthe low level, local features may be extracted for analysis, such as the quality of edge fusion, noise level, and water-mark. Human is very good at high level and middle levelanalysis and has some ability in low level analysis. In con-trast, at least in recent years, computers stillhave difficultiesin high level analysis. Nevertheless, computers can still behelpful in middle level and low level analysis, as comple-ment to human examination when the visual cues are miss- ing.

et al. have done some pioneering work on thisproblem. They basically test the statistics of the images

They test the interpolation relationship among thenearby pixels if resampling happens when synthesis, thedouble quantization effect of compression after the images are synthesized, the gamma consistency via blindgamma estimation using the bicoherence, and the signal to noise ratio (SNR) consistency. And most recently, they have also proposed checking the Color Filter Array (CFA) inter-polation relationship among the nearby pixels Their

Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)

1063-6919/05 $20.00 © 2005 IEEE

Page 2: Detecting Doctored Images Using Camera Response Normality

approaches are effective in some aspects, but are by nomeans always reliable or form a complete solution. For ex-ample, resampling test fails when the two images are notresampled or resampled with the same scale. The doublequantization effect does not happen if two images are com-pressed in the same quality. The blind gamma estimation and the SNR test may fail when the two images come fromthe same camera or the kurtoses of the noiseless image andnoise are not known a priori. And the CFA checking mayrequire a knowledge of the algorithm ofthe

We propose a new method to detect doctored images. Itis based on recovering the response function of the cameraby analyzing the edges of an image. The algorithm to re-cover the camera response function is borrowed fromwhich examines the patches along edges. Our idea is that:

If the image is real, then different sets of particularpatches should result in the same normal camera re-sponse functions. Therefore, if the response functionsare abnormal or inconsistentwith each other, then theimage may be doctored.

In comparison with Popescu and work [11,our approach checks different low-level cues of images,

the response function of the camera. Although theblind gamma estimation method may also recover thegamma of the response function, in reality few camera re-sponse function is exactly a gamma curve 8, 7,even when the manufacturer may design so. As a result, theestimated gamma may vary significantly on different partsof the image (see Figure 6 of even the image is orig-inal, the detection unreliable. Moreover, the blindgamma estimation method tests regions of an image so thatthe Fourier transform can be applied. In contrast, our ap-proach checks the edges of an image. Therefore, they canbe used in differentsituations.Finally, in principle the blindgamma estimation should compute the 4D bicoherence inorder to detect the tampering on 2D images. Such compu-tation is formidable. As a result, Popescu and resortto row-wise (or column-wise)gamma estimation that onlyrequires 2D bicoherence. Therefore, if the tampered regionis surrounded by original regions, then the tampering maynot be detected. Unfortunately, such kind of tampering isthe most common.

2 Principles of Our Approach2.1 The response function re-

covery algorithmThe camera response function is the mapping relationshipbetween the pixel irradiance and the pixel value. For cam-

'We do notknow the detailsof thisalgorithm. The paper is temporarilyunavailable.

Figure 1: The principleof responsefunctionrecov-ery algorithm, adapted from (a) The first andsecond columns of pixels image the scene regionwith constant radiance and the lastcolumn im-ages the scene region with radiance The thirdcolumn images both regions. (b) The irradiancesof pixels in the first two columns map to the samepoint in RGB space, while those in the last col-umn map to The colors of the pixels in thethird column is the linear combinationof and(c) The camera response function warps the linesegment in (b) into a curve. The algorithm in [6] isto recover the linear relationshipin (b).

eras with color CCD sensors, each R, G, and B channel hasa response function.

Suppose a pixel is on an edge, and the scene radiancechanges across the edge and is constant on both sides ofthe edge (Figure Then the irradiance of the pixelon the edge should be a linear combination of those of thepixels clear off the edges (Figure Due to nonlinearresponse of the camera, the linear relationship breaks upamong the read-out values of these pixels (Figure (c)).Lin

[6] utilized this property to compute the inverse cam-era response function, find a function g = to mapthe RGB colors back to irradiance, so that the linear rela-tionshiparound edges is recovered at best. They formulatedthe problem into finding the MAP solution to:

where = is the set of observationsand are the colors of the non-edge regions

and the edge pixel, respectively), is the prior of the in-verse response function represented by a Gaussian Mix-ture Model which is learnt from the databaseand the likelihood is defined as:

in which measures how well the linear relationshipis recovered:

Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)

1063-6919/05 $20.00 © 2005 IEEE

Page 3: Detecting Doctored Images Using Camera Response Normality

Figure 2: Some typical inverse response curves in DoRF. Each imaging media has three response functions for R, G, and B channels, respectively.

To compute ,the algorithm needs to find patches alongcan be formed, to meet the following re-edges, so that

quirements:

The areas of the two partitioned non-edge regions (re- gions with colors or in Figure should beas close to each other as possible.

2. The color variance inside the two non-edge regions should be as small as possible.

3. The difference between the mean of theedge regions should be as large as possible.

4. The colors of the pixels inside the edge region should be between those in the non-edge regions.

If the color range in the selected patches are not too nar- row, the recovered inverse response functions are reported to be quite accurate

2.2 Our doctored image detectionalgorithmUsually, the camera response functions have the followingproperties (Figure 2, also see the results in [6, 7, andthe DoRF database [

All the response functionsshould be monotonically in-creasing.

2. All the response functions, after mild smoothing,should have at most one inflexion point.

3. The response functions of R, G, and B channels should be close to each other.

If the image formation does not comply with the physical process, we may expect that the recovered inverse response functionsexhibit some abnormality or inconsistency.

To examine a suspectable image, the user may selectsome pixels along the edges that might be the boundary ofblending the images, or along the edges of different objects.Our system will select around the selected pixel an optimal

Figure 3: An example of bad automatic patch se-lection. The patches (indicated by small squares) along the smoke should not be chosen. And thepatches may not gather around desired edges.

patch that best complies with the requirements in last sub-section. However, if all the scores of patches around thechosen pixel are too low, then no patch is selected. We pre-fer manual selection because automatic selection may notselect good patches that are best along the edges of objectocclusion or texture and the patches may not be close to the desired edges. Image segmentation can offer some help, but the results may not always agree with high-level un-derstandings. Figure 3 shows examples of bad patches via automaticselection.

With the selected patches, the algorithm proposed by Linet [6] is applied to compute the inverse response func-tions (i = 0 x 1) of R, G , and Bchannels. Then we evaluate whether they are normal, ac-cording to the above-mentioned properties of normal re-sponse functions, which also hold for inverse response func-tions. Therefore, the following three features are extractedfor every trio of the inverse response functions:

1

0

= -

1

where

=

= the number of intervals on which = 0,

It is easy to see that penalizes non-monotone discourages functions with more than one inflexion

point, and encourages the inverse response functions of R, G, and B channels to be close to each other.

To decide whether a trio of the inverse response func-tions is normal or not, we have trained an SVM usingthe curves in the DoRF Database (eliminating those

Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)

1063-6919/05 $20.00 © 2005 IEEE

Page 4: Detecting Doctored Images Using Camera Response Normality

Figure 4: The distribution of normal inverse re-sponse functions and abnormal ones in the fea-ture space. The positive samples cluster on theline that 0. While the negativesamples are quite dispersed.

that areclose to linear)as positive samplesand the abnormalcurves collected when we try our algorithm (not includingthose shown as our experimental results) as negative sam-ples, using the 3D feature vector TheSVM also produces the confidence c of the classification.Negative c indicates abnormality, while positive c impliesnormality. The larger the is, the higher the confidence is.A sample distribution of normal inverse response functionsand abnormal ones in the feature space is shown in Fig-ure 4. One can easily see that the normal inverse responsefunctionsare quitecompact in the feature space. This wouldmake the SVM classification reliable.

3 Experimental ResultsWe give examples to show the effectiveness of our method.Two of the test images are Figures and (bl). They aretaken from where Figure is a real image, whileFigure is a doctored image. Visual examination ishard to tell which is real and which is doctored. Figure 5also shows different patch selection strategies (two exam-ples are shown in Figures and and the corre-spondingrecovered inverseresponse functions. One can seethat when the image is real, the recovered inverse responsefunctionsare all normal (the output of our SVM varies from0.53 to 1.77) and close to each other (FiguresWhen the image is doctored, some inverse response func-tions become abnormal (Figures andwhere the patches along the synthesis edge are selectedforcomputation. Moreover, their shapes change significantlywith different patch selection (Figure

In Figure 6, more examples are shown. They are synthe-sized by us using the Lazy Snapping tool The tool cansegment objects easily and also estimate the alpha channelalong the segmentation boundaries. When the patches areselected from the background, which is unaltered, the in-

verse response curves are all normal (FiguresWhen the patches are along the synthesis edge, the inverseresponse curves are all abnormal (Figures

The above examples show that the abnormality and con-sistency of response functions is good indicator of whetheran image is synthesized from other images.

4 Discussions and Future WorkWith the improvement of editing technologies,realistic can be synthesized easily. Suchfooling have caused some problems. Wehave proposed an algorithm for doctored image detection,which is based on computing the inverse camera responsefunctionsby selecting appropriatepatches along edges. Theexperiments show that our algorithm is effective on somekinds of images that the calibration algorithm proposed byLin et [6] can work. Typically, the image should be ofhigh contrast, so that the total color range of the selectedpatches is wide enough. And it should also have manyedges across which the regions are homogeneous, so thatenough patches that meet the requirements in Section 2.1could be found. However, even so, our approach may stillfail if the component images are captured by the same cam-era and these components are not synthesized along objectedges. Moreover, to apply our algorithm, one should alsopay attention to the numbers of patches in foreground,back-ground or along the suspectable synthesis edges, so thatthey are balanced, Otherwise,the results might be biased.

Our algorithm assumes that the camera response func-tion is spatially invariant in the same image. However, sometypes of cameras, such as HP Photosmart cameras andcameras using CMOS adaptive sensors, have adaptive re-sponse functions in order to produce more pleasing photosof high contrast scenes. Clearly, the images taken by suchcameras will be mis-classified as doctored.

Due to the above-mentioned limitations,we are trying toimprove our detection algorithm by integrating morelevel cues. Moreover,doctored video detectionis also worthexploring,although currently video synthesis is not as suc-cessful as image synthesis.

Acknowledgment: The authors would like to thank Dr.Singbing Kang for his valuable communication, SteveLin for sharing us the calibration code, Dr. Yin forsharing us the Lazy Snapping tool, and Dr. Sun andDr. Aseem Agarwala for sharing us images.

ReferencesA. et Interactive Digital Photomontage.Pro-ceedings

[2] Chuang, B. D.H. and R.A Bayesian Approach to Digital Matting. Proceedings ofCVPR 2001, 264-271.

Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)

1063-6919/05 $20.00 © 2005 IEEE

Page 5: Detecting Doctored Images Using Camera Response Normality

c = 1.24

(a9) FUSUB, c = 0.53

(a4) S, c = 1.77

(a6) B, c = 1.11

BUS, c = 1.36

(b3) F, c = 0.32

FUS, c -1.55

(b7) FUB, c = 0.66

(b9) FUSUB, -3.40

c -2.73

(b6) B, c = 1.55

BUS, c -3.43

Figure5: Test images, patchselectionexamples, hecomputed inverseresponse (al)A realimage. (bl) A doctored image. Examples of patch selection,where the patchesare selected inforegroundonlyandalongthesynthesisedgeonly, respectively. The rest imagesarethe computedinverseresponse functions with different patch selection. The patch selection strategy and the confidence c ofSVM classification are shown at the bottom, where F,S, and represent Foreground, SynthesisEdge, andBackground, respectively. means that the patchesare selected in both foregroundand background.Other abbreviationcan be understoodsimilarly.

Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)

1063-6919/05 $20.00 © 2005 IEEE

Page 6: Detecting Doctored Images Using Camera Response Normality

(al) B, c = 0.20

(a2) S, c = -0.25

B, c = 0.52 B, c = 0.71

S, c -9.64

Figure 6: More doctored images (a)-(c) and the computed inverse response functions with differentpatch selection strategies. The inverse response functions computed using the patches inthe background only. The inverse response functions computed using the patches along thesynthesis edges only. The superimposed objects in (a)-(c) are indicatedby rectangles or arrows.

[3] M. and A. Blake. Poisson Image Editing.Proceedings Siggraph 2003, 313-318.

[4] J. Sun, J. Jia, Tang, and Shum. Poisson Matting.Proceedings of ACM Siggraph 2004, 3

[5] Y. Li, J. Sun, C.-K. Tang, and H.-Y. Shum. Lazy Snapping.Proceedings of ACM Siggraph 2004, pp. 303-308.

[6] Lin, J. Gu, Yamazaki, and H.-Y. Shum. Radiomet-ric Calibration from a Single Image. Proceedings of CVPR

P.E. Debevec and J. Malik. Recovering High DynamicRange Radiance Maps from Photographs. Proceedings ofACM Siggraph 1997,

[8] T. Mitsunaga and S.K. Nayar. Radiometric Self Calibration.Proceedings of CVPR 1999,

[9] M.D. Grossberg and S.K. Nayar. What Is the Space ofCamera Response Functions? Proceedings of CVPR 2003,

Lee and Jung. A Survey of WatermarkingTechniques Applied to Multimedia. Proceedings of 2001

2004,

IEEE International Symposium on Industrial Electronics

A.C. Popescu and H. Statistical Tools for DigitalForensics. 6th International Workshop on Information Hid-ing, Toronto, Canada,A.C. Popescu and H. Farid. Exposing Digital Forgeries inColor Filter Array Interpolated Images. IEEE Transactions on Signal Processing, in review.M.D. Grossberg and S.K. Nayar. Databaseof Response Functions Available at:

SVM toolbox. at:

D.L. Ward. Photostop. Available at:

[ Hewlett-Packard Company. HP PhotosmartCameras Adaptive Lighting. Available at:http://h 10025 1

1,

Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)

1063-6919/05 $20.00 © 2005 IEEE