ieee transactions on information … › ~ysato › papers › kobayashim-tifs...884 ieee...

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 5, NO. 4, DECEMBER 2010 883

Detecting Forgery From Static-Scene Video Based onInconsistency in Noise Level Functions

Michihiro Kobayashi, Takahiro Okabe, Member, IEEE, and Yoichi Sato, Member, IEEE

Abstract—Recently developed video editing techniques haveenabled us to create realistic synthesized videos. Therefore, usingvideo data as evidence in places such as courts of law requiresa method to detect forged videos. In this study, we developed anapproach to detect suspicious regions in a video of a static sceneon the basis of the noise characteristics. The image signal containsirradiance-dependent noise the variance of which is describedby a noise level function (NLF) as a function of irradiance. Weintroduce a probabilistic model providing the inference of an NLFthat controls the characteristics of the noise at each pixel. Forgedpixels in the regions clipped from another video camera can bedifferentiated by using maximum a posteriori estimation for thenoise model when the NLFs of the regions are inconsistent with therest of the video. We demonstrate the effectiveness of our proposedmethod by adapting it to videos recorded indoors and outdoors.The proposed method enables us to highly accurately evaluate theper-pixel authenticity of the given video, which achieves denserestimation than prior work based on block-level validation. Inaddition, the proposed method can be applied to various kinds ofvideos such as those contaminated by large noise and recordedwith any scan formats, which limits the applicability of the existingmethods.

Index Terms—Expectation maximization (EM) algorithm,forgery detection, maximum a posteriori (MAP) estimation, noiseanalysis, noise level function (NLF).

I. INTRODUCTION

H OW can one be assured of the authenticity of a digitalimage? For example, when digital photographs are used

as testimony in courts of law, how is it possible to distinguish be-tween genuine and falsified evidence? Given the recent progressand development of digital editing techniques that can be usedto synthesize realistic images, it is difficult to guarantee the au-thenticity of digital photographs.

In the past, digital watermarking was the main technologyused to ensure authenticity [1] (e.g., preventing illegal copyingof images from the Internet). However, it is impractical to embeddigital watermarks in all images, and therefore, digital water-marking is limited in its ability to ensure authenticity.

In response to the limitations of watermarking, a number offorgery detecting techniques have been developed that exploit

Manuscript received January 18, 2010; revised August 17, 2010; acceptedAugust 26, 2010. Date of publication September 13, 2010; date of current ver-sion November 17, 2010. The associate editor coordinating the review of thismanuscript and approving it for publication was Dr. Mark (Hong-Yuan) Liao.

The authors are with the Institute of Industrial Science, The University ofTokyo, Tokyo 153-8505, Japan (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIFS.2010.2074194

the correlation and the inconsistencies in forged images [2].Johnson and Farid used inconsistencies in lighting [3] and chro-matic aberration [4]. Lin et al. estimated a camera responsefunction and verified its uniformity across an image [5]. Lukás etal. extracted fixed-pattern noise from an image and compared itwith a reference pattern [6]. Fridrich et al. computed the correla-tion between segments in an image and detected cloned regions[7]. Ye et al. estimated a JPEG quantization table and evalu-ated its consistency [8]. These different digital image forensicmethods help us to aggressively estimate the authenticity ofstatic digital images.

How can one be assured of the authenticity of a video takenwith a camcorder? In tandem with static image forgery detec-tion, the need for detecting forgery in a video is also an impor-tant issue, and research focused on digital video forensics is justbeing started.

To provide some context, tampering methods for videos con-taining static scenes recorded on a surveillance camera can beclassified into two approaches: 1) Intravideo forgery—replacingregions or frames with duplicates from the same video sequenceto hide unfavorable objects in a scene by overwriting these withthe background from other segments in the same video. 2) Inter-video forgery—clipping objects from other images or video seg-ments and superimposing them on desired regions in the video.

Wang and Farid studied a method for detecting intravideoforgery [9]. Since duplication yields high correlation betweenoriginal regions and cloned ones, detecting unnaturally high co-herence is useful for discovering copy–paste tampering. How-ever, their proposed method has a serious limitation in that itcan only detect copy–paste tampering from the same video se-quence. That is, it cannot be used to detect superimpositioncaused by inserting objects from other videos.

In contrast, our proposed method can detect superimpositiongenerated from video not contained in the original sequence.Specifically, our method uses noise inconsistencies between theoriginal video and superimposed regions to detect forgeries. Weexploit the nature of photon shot noise mixed into image sig-nals, which depends on the camera model and recording param-eters. Photon shot noise results from the quantum nature of pho-tons, where the variance of the number of photons coming intoa camera is strongly correlated to the mean following a Poissondistribution. Therefore, this correlation between the varianceand the mean (characteristic of photon shot noise) can be usedas a powerful clue to detect inconsistencies in forged videos.

A CCD camera converts photons into electrons and finallyinto bits; therefore, the relationship between the variance andthe mean of the number of photons is converted into that be-tween the variance and the mean of the observed pixel value.

1556-6013/$26.00 © 2010 IEEE

884 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 5, NO. 4, DECEMBER 2010

Fig. 1. Diagram of noise characteristics. The solid (forged region) and thedashed (unforged region) lines are the NLFs. The dots stand for the noise char-acteristics calculated at each pixel from a video clip.

This relationship is formulated as the noise level function (NLF)by Liu et al. [10]. The NLF depends on such parameters as in-herent parameters of the camera and recording parameters. Con-sequently, by comparing the relationships of the pixel values ina video clip, we can detect forged regions clipped from anothervideo.

Our method targets video scenes recorded by a static surveil-lance camera, and we detect forgeries as follows. Because notonly is superimposition of stationary objects the easiest way totamper a video but also viewers’ eyes pay less attention to sta-tionary than moving objects in a video, the proposed methodis useful to support verification. Given an input video that con-tains some forged regions, we first analyze noise characteristicsat each pixel. Fig. 1 shows a diagram of the noise character-istics for the forged region and the unforged region. The solid(e.g., forged region in the video) and the dashed (e.g., unforgedregion) lines are the NLFs of the two distributions. Each dotin the figure represents a noise characteristic, i.e., the varianceversus the mean of pixel values, computed for each pixel. Oncewe obtain the per-pixel noise characteristics, the NLFs are fittedto the distribution using maximum a posteriori (MAP) estima-tion. Likelihood is defined as the chi-square distribution to dealwith the fluctuation in the noise characteristics resulting froma limited amount of sampled data. We simultaneously estimatethe posterior probability of forgery (at every pixel) and the pa-rameters of the NLF using the expectation maximization (EM)algorithm. We represent an NLF as a linear combination of itsbasis functions in a similar manner to Liu et al. [10]; we syn-thesize a number of NLFs corresponding to various CRFs andnoise parameters and obtain a set of linear basis functions viaprincipal components analysis (PCA).

The primary contribution of this work is using noise charac-teristics to discover image inconsistencies (forgeries) in videos.We apply the proposed method to tampered videos to confirmthat our method can properly detect forged regions. We demon-strate that our method can detect the region superimposed fromanother video by checking the difference in the noise charac-teristics between each region. Unlike other digital forensicsmethods, our method is robust to noise mixed into input databecause we make use of the characteristics of the noise itself.

The rest of this paper is organized as follows: Section II re-views the digital forensic methods for images and videos. InSection III, we formulate the relationship between the meanand the variance of pixel values. The probabilistic model offorgery and the estimation method are introduced in Section IV.

Section V provides experimental results, and lastly, we give ourconclusions in Section VI.

II. RELATED WORK

In this section, we summarize several approaches that havebeen developed to detect forgeries in digital images and videos.We also refer to research that uses noise in images and videosas a source of information.

A. Forgery Detection in Images

Image tampering methods can be classified into two ap-proaches: 1) intraimage forgery—replacing regions with othersin the same image; and 2) interimage forgery—superimposingregions clipped from other images.

Fridrich et al. were the first to attempt to detect forgeries inimages [7]. This method targets intraimage forgery that usu-ally yields an unnaturally high correlation between duplicatedregions. They introduced a detection method based on robustblock matching, which was carried out by using discrete cosinetransform (DCT) coefficients in order to deal with lossy JPEGcompression.

Subsequent approaches have targeted interimage forgeries byverifying the uniformity of certain characteristics in an image todetect forgery. Johnson and Farid developed a method based onoptical clues [3]. They estimated the distribution of light sourcesilluminating each object by using observed brightness and cal-culated surface normals along the object’s occluding contours,and then investigated the consistency of the estimated illumi-nation distributions. This work showed that illumination distri-butions estimated from an image help to differentiate tamperedobjects in the image.

Johnson and Farid also developed a method for detectingforgeries on the basis of lateral chromatic aberration [4], i.e.,a spatial shift of light passing through the optical system due todiffering refraction between wavelengths. Global model param-eters that determine the displacement of lateral chromatic aber-ration at each pixel were estimated, and the degree of tamperingwas evaluated by calculating the average angular error betweenthe displacement vector determined by the global model param-eters and the displacement vector computed locally.

Lin et al. developed a method to examine the consistencyin camera response functions estimated on the basis of inten-sity change along edges [5]. While the brightness of an edgeshould be a linear combination of those from the surfaces oneither side, a nonlinear camera response skews the linearity ofthe mixture of brightness. This approach estimates the nonlinearinverse response functions that convert a nonlinear relationshipof observed pixel values on the edge into a linear relationship.If the function estimated from an edge does not conform to therest of the image, the edge is marked as a sign of tampering.

Ye et al. developed a method to detect inconsistencies in animage on the basis of a blocking artifact measure for imagecompression [8]. If blocks compressed with different quantiza-tion tables are combined in an image, the blocking artifact mea-sure of the forged block is much larger than that of an authenticblock. They estimated the quantization table from the histogramof DCT coefficients and evaluated the blocking artifact measureof each block.

KOBAYASHI et al.: DETECTING FORGERY FROM STATIC-SCENE VIDEO BASED ON INCONSISTENCY IN NLFs 885

Lukás et al. developed a method to verify the pattern of thenoise distribution [6]. Due to the sensor imperfections devel-oped during the manufacturing process, a CCD camera containspixels with differing sensitivity to light. This spatial variation ofsensitivity is temporally fixed and known as fixed pattern noise.Since this nonuniformity is inherent in a camera, one can ex-ploit it as a type of fingerprint. They determined the referencenoise pattern of a camera by averaging the noise extracted fromseveral images. Given an image, they extracted fixed patternnoise from the image using a smoothing filter and identified thecamera that took the image. They also developed a method fordetecting forgeries in an image using the same approach [6].

B. Forgery Detection in Videos

To detect video forgery, one may think of applying an imageforgery detection method to each frame of a given video se-quence. However, some types of forgery cannot be detectedin this manner due to a lack of consideration of relationshipbetween the frames. For instance, simple duplication is unde-tectable since each frame appears to be authentic if evaluatedindependently.

Compared to the image forensic techniques mentioned above,only a few techniques have been developed for videos, but thisfield of research is certainly growing. Similar to those for animage, forgery detection techniques for a video are classifiedinto two types: intervideo and intravideo approaches.

As mentioned earlier, detection of replacement and duplica-tion in videos has been studied by Wang and Farid [9]. Theyhave also developed an inconsistency-based detection methodthat checks the consistency of deinterlacing parameters used toconvert an interlaced video into a noninterlaced form [11]. Sinceinterlaced videos have half the vertical resolution of the orig-inal video, the deinterlacing process fully exploits insertion, du-plication, and interpolation of frames to create a full-resolutionvideo. In their method, parameters in the interpolation and theposterior probability of forgery are estimated simultaneously byusing the EM algorithm. They also suggested that the motionbetween fields of a frame is closely related across fields in in-terlaced videos. Evaluating the interference to this relationshipcaused by tampering allows their system to detect forgeries in aninterlaced video. While their method can detect superimposedregions from other video sequences, it limits the form of thevideo, that is, deinterlaced or interlaced.

The correlation of noise in a video has also been exploredto detect forgery. Hsu et al. developed a method on the basisof noise characteristics extracted by noise reduction [12]. Theyexploited block-level correlation of noise residual as the char-acteristics of a video. If a region is impainted by another regionin the same video, the correlation between the regions takes anunnaturally high value. In contrast, noise residuals of the synthe-sized textured region from another video exhibit low coherencewith the noise residual of other regions. However, this approachgreatly depends on the noise reduction method. When the noiseintensities of the original and tampered regions are significantlydifferent, it fails to reduce the noise accurately and can misssome forgeries because of the calculation error of noise residual.

We proposed a forensic technique for tampered videos basedon noise characteristics [13]. We extend our prior method as fol-

lows; we assumed linear NLFs and estimated the NLF in theauthentic region in a least squares manner, and then classifiedthe pixels into authentic and tampered with respect to the dis-tance from the estimated NLF. While the prior method can de-tect the pixels superimposed from another video in some cases,it is not applicable to nonlinear NLFs, which is commonly usedin commercial cameras. In addition, discrimination of forgeryaccording to the distance is not powerful to distinguish two sim-ilar NLFs. In this study, we deal with nonlinear NLFs with theassistance of prior knowledge of the generic NLFs and intro-duce a probabilistic approach, instead of a deterministic one, todistinguish the authentic and the forged pixels.

In contrast to previous work, our method exploits the noisecharacteristics on the basis of the generative process of thenoise in a video. We calculate the probability of forgery anddetect suspicious regions via MAP estimation. Unlike the othermethods, the proposed method is applicable to the videos withother scan formats such as the progressive one and can deal witha combination of the video sources with significantly differentnoise intensities. In addition, our method detects forgeriesfor pixel-level, estimating more thoroughly than methods forblock-level validation.

C. Effective use of Noise in Digital Data

Since the early period of digital cameras, various reports havebeen given on the study of noise in signal processing. The mainpurpose of this field of research is to remove noise in imagesand videos. On the other hand, recently some researchers haveinterestingly attempted to effectively use noise rather than try toremove it from images and videos. In their report [10], whichwe referred to in the definition of NLF, Liu et al. estimated anNLF from a single image to exploit for adaptive bilateral fil-tering and edge detection. Since they calculate an NLF froman image by using spatial averaging, spatial variation cannot beseparated into noise and textures. On the other hand, since theproposed method calculates an NLF from a video by using tem-poral averaging, it can separate noise from the temporal varia-tion.

Matsushita and Lin exploited the distribution of temporalnoise intensity at each pixel to estimate camera response func-tions (CRFs) [14]. They made use of the fact that the distributionof noise is symmetric about zero in nature, but is skewed bynonlinear CRFs. They estimated the inverse CRF that convertsthe distribution of the noise calculated from the observed pixelvalues into symmetric in the irradiance domain. Takamatsu etal. exploited the characteristics of noise to estimate the CRFs aswell [15]. They focused on the nonaffinity relationship betweenthe observed pixel value and noise variance, not the shapeof the distribution of the noise. They also developed anothermethod to estimate CRFs on the basis of probabilistic intensitysimilarity [16]. The probabilistic intensity similarity is thesimilarity measure of the observed pixel values and representsthe likelihood that two pixel values originated from the samescene radiance [17].

III. NOISE CHARACTERISTICS OF CCD CAMERA

We consider the inconsistencies of the noise characteristicsmixed in the signal to be a clue of tampering. In Section III-A,


we first review the noise which arose during the signal pro-cessing of a CCD camera. We then introduce the NLF followingthe definition developed by Liu [10]. Next we describe the de-tails of an NLF from a statistical point of view. We derive themean and the variance of an observed pixel value sequence andshow the relationship between the mean and the variance inSection III-B.

Definition of NLF

A CCD camera converts photons into electrons and then intobits. This process, called the radiometric CCD camera model,has been studied in the past [18], [19]. In this process, sev-eral noise sources corrupt the input signals, including photonshot noise, dark current noise, read-out noise, and quantizationnoise. In this work, we focus on photon shot noise for the fol-lowing two reasons: 1) photon shot noise is the dominant noisesource for most images in a scene (excluding those taken inextremely dark environments), and 2) the relationship betweenscene brightness and noise intensity is useful for forgery detec-tion, since this relationship should be consistent in an image.

The number of photons that fall onto a CCD element fluctu-ates temporally, behaving as noise. Due to the quantum natureof photons, this fluctuation follows the Poisson distribution, thevariance of which is closely related to its mean. Therefore, thenoise intensity (the temporal variance of pixel values) dependson the mean of the pixel values. In this paper, we assume thatthe distribution that photon shot noise obeys is Gaussian be-cause the number of photons is large enough to approximate thePoisson distribution by the Gaussian distribution. Note, how-ever, that while the mean and the variance of Gaussian distri-bution are independent from each other, those of photon shotnoise we consider here have relationships because of the char-acteristic of the Poisson distribution. While it is impossible tomeasure the distribution of photons directly, we can computethe relationship between the mean and the variance of the ob-served pixel values instead. Our method uses this relationshipas a measure of forgery in videos.

Here we briefly describe an NLF by following the formula-tion used in Liu et al. [10], where they estimate an NLF from asingle image. In this work, we make the following assumptions.The first is that the input video is of a static scene as mentionedin Section I. We assume that stationary objects clipped from an-other video are superimposed on this video. This assumptionenables us to obtain the noise characteristics by calculating thetemporal mean and the variance at each pixel. The second iszero-mean noise. This assumption suggests that the true meanvalue of the observed pixel values is obtained by temporal aver-aging.

The pixel value we observe fluctuates randomly due to theeffects of noise. Let and be the mean and the variance of theobserved pixel values. We consider the variance as a functionwith respect to the mean and define an NLF as1

(1)

1Liu et al. defined the NLF as the standard deviation � in accordance with themean in their study [10] (2), but we define the NLF as the variance � in thispaper for ease of calculation later.

The NLF gives the variance of pixel values for the observedmean for given pixel values. As derived in Section III-A, anNLF depends on the inherent parameters of the camera and therecording conditions (e.g., quantum efficiency, CRF, exposuretime, aperture size, and electric gain).

A. Properties of NLF

We now consider how the mean and the variance of the ob-served pixel values are computed via the signal processing in aCCD camera model. According to Liu et al. [10], the pixel valuestored in a memory after analog-to-digital (A/D) conversion isformulated as

(2)

where is the CRF, is the electric gain, is the quantum ef-ficiency of the CCD element, and is the number of photons en-tering CCD during the exposure time. The four following kindsof noise source are assumed: , , , and , whichare photon shot noise, dark current noise, read-out noise, andquantization noise, respectively. Using the first-order approxi-mation via the Taylor expansion of around , we obtain theobserved value as

(3)

From (3), the mean of is derived as

(4)

where we assume that. On the other hand, the variance of can be

written as

(5)

where , , and are the variance of dark current noise,read-out noise, and quantization noise, respectively. The firstterm in the braces in (5) corresponds to the variance of photonshot noise, which obeys the Poisson distribution.

Now (5) can be rewritten by substituting (4) so that

(6)

Note that (6) represents the dependence of the variance on themean of the pixel value, that is, (6) is the theoretical formulationof an NLF and is equivalent to (1). Equation (6) also shows thatan NLF depends on various parameters, especially the derivativeof a CRF. This means that an NLF is highly dependent on theshape of a CRF.

Here we show how a CRF and an NLF are related to eachother by showing several synthetic examples. Fig. 2 shows threeexamples of a CRF and the NLF synthesized from the CRFand synthetic noise. The solid lines in the figures are the CRFschosen from the database of the measured CRFs mentioned in


Fig. 2. Examples of CRFs and NLFs of synthetic noise. The solid lines and dashed lines indicate the CRFs and NLFs, respectively. (a) CRF No.27. (b) CRFNo.53. (c) CRF No.164.

Fig. 3. Noise characteristics of videos recorded Macbeth Color Checker Board with various CRFs. Order of images is same as Fig. 2. (a) CRF No.27. (b) CRFNo.53. (c) CRF No.164.

Section IV. The dashed lines are the simulated NLFs corre-sponding to the CRFs. Note that the shape of an NLF changessignificantly for different CRFs. As derived in (6), an NLF in-dicates that noise increases in the range where the derivative ofthe CRF is high.

We also show the noise characteristics of a given real videoclip recorded with different CRFs calculated by (1) in Fig. 3.Each dot in the figure indicates the noise characteristics of onepixel. Note that the horizontal and the vertical axes indicateabsolute, not normalized, values. The plotted points distributedepending on the CRF as simulated in Fig. 2. The points arebroadly spread as the noise increases because we do not have alarge enough number of samples (corresponds to the number offrames in the given video in our method) to calculate the vari-ance of large noise accurately. The fluctuation of the variancecaused by a small number of samples can be represented by thechi-square distribution, and we apply the distribution to formu-late the likelihood of the variance in the estimation phase. Wediscuss this spread of the points in Section IV-B.

IV. FORGERY DETECTION FROM STATIC VIDEO SCENE

Our method computes the probability of forgery for eachpixel by checking the consistency of the NLFs in forged regionsand unforged regions. Those pixels with high probability offorgery are regarded as a part of a forged region. Note that whenan input video is forged, it contains two groups of pixels, i.e.,forged pixels and unforged pixels. To estimate the probabilitythat a pixel came from the video source with a certain NLF, weshould know the parameters of the NLF. On the other hand, theprobability that a pixel belongs to an unforged/forged regionis required to estimate the parameters of the NLFs of the orig-inal/superimposed video, that is, NLF-based forgery detectionis a chicken-and-egg problem. Therefore, our method uses the

EM algorithm for estimating the NLF for each video sourceand the probability of forgery for each pixel simultaneously.

Before the forgery detection phase, we introduce the basisfunctions of NLFs following the work by Liu et al. [10]. A largenumber of noise distributions are synthesized by using variousCRFs from a database with some noise parameters, and we ob-tain six basis functions via PCA. We represent NLFs as linearcombination of these basis functions and estimate the combina-tion coefficients by using the EM algorithm. The details of howwe obtain those basis functions are referred to in Section IV-A.

In the evaluation phase, given a forged video sequence, wecalculate the noise characteristics, i.e., the relationship betweenthe mean and the variance of the pixel values as in (1). Thepixel-wise noise characteristics are simply obtained by temporalaveraging because we assume a video with a static scene as men-tioned in Section III. Here we should consider the fluctuation inthe calculated variance because of the limited number of sam-ples, which degrades the quality of the detection performance.To deal with the distribution calculated from a small number ofsamples, we introduce a likelihood of a variance on the basis ofthe chi-square distribution in Section IV-B.

A. Basis Functions of NLFs

In this section, we consider the parameterization of possibleNLFs for estimating the NLFs of the given video. As mentionedbefore, we calculate sample points of noise characteristics byaveraging the pixel values temporally. Generally, noise charac-teristics themselves are represented by nonparametric form andnot restricted by any constraints. However, the shape of NLFshould be constrained because the variety of a CRF, that yieldsan NLF, is limited. To restrict the shape of an NLF and reducethe dimension of parameters, we synthesize a number of NLFsand obtain a small number of basis functions of possible NLFs.Following the previous work by Liu et al. [10], we prepare a


Fig. 4. Principal components of synthetic NLFs. (a) Mean function of NLFs.(b) Linear basis functions of NLFs.

set of statistical basis functions of NLFs synthesized from 201measured CRFs in the database.2

Applying PCA to the synthesized NLFs, we adopt six dom-inant eigenvectors as basis functions following the prior work[10], the accumulated proportion of which to represent the NLFsis 99.8%. The mean function and the basis functions are repre-sented by an eighth-order polynomial. Using a mean function

and six linear basis functions , anNLF is represented as

(7)

where is the weight parameter of the th basis function andis estimated in the least square manner by the EM algorithm asdescribed in Section IV-C. Fig. 4 shows the mean function andthe linear basis functions of NLFs. Note that, once an NLF isobtained, the variance can be computed for each mean .

B. Likelihood Modeling of Noise Characteristics

In this section, we refer to the likelihood model of the esti-mated NLF. Here we consider the fluctuation in statistical valuescalculated from a small number of measurements with largevariance and how to deal with them. Applying the knowledgeof statistics to a small number of samples, the likelihood of theestimated NLF is calculated from the mean and the variance ofthe given video.

The observed pixel values we analyze in this paper are con-taminated by noise characterized by its mean and variance. Weassume that the population of the noise follows the Gaussiandistribution , that is, Gaussian distribution is the pop-ulation model of the noise. The proposed method checks theconsistency of the parameters of the population model to detectforgery. These parameters, named population mean and popu-lation variance , respectively, are ideal values and we cannotobtain them.

On the other hand, given the input video, we can calculate themean and the variance by averaging the sample values. Thesestatistical measures are named sample mean and sample vari-ance , respectively, and distinguished from the parameters ofthe population model above. If we have an infinite set of sam-ples, the sample mean and the sample variance are exactly the

2Available: http://www.cs.columbia.edu/CAVE.

same as the population mean and the population variance, re-spectively. However, in the case of a finite set of samples, thesample mean and the sample variance fluctuate in every calcula-tion result. Fluctuation in the sample variance is especially cru-cial if the population variance is quite large compared to thenumber of samples. To deal with the fluctuation in the calcu-lated measures, we introduce knowledge of statistics for a smallnumber of samples.

Let be samples generated from the popu-lation that obeys Gaussian distribution , which corre-sponds to a temporal sequence of pixel values for frames atone pixel. We cannot obtain the parameters of the population be-cause the number of obtained samples is limited to . Instead,we calculate the sample mean and the sample variance ,respectively, as

(8)

and

(9)

Now we introduce the chi-square value defined as

(10)

where is the population variance. Note that cannot becomputed from the samples; we compute the chi-square valueby assuming a certain here and discuss how we estimate thepopulation variance later. The chi-square value is known to obeythe chi-square distribution with degrees of freedom,namely, . The probability that we obtain the chi-squarevalue can be written as

The chi-square distribution gives us the likelihood of thesample variance under the assumption that populationvariance is . Once we obtain from the samples and set acertain , we can calculate from (10) and the likelihood of

, that is,

(11)

Here we consider how we set in (11). Because is the pop-ulation variance of the noise described by the NLF we estimate,

can be computed from (1). The population mean is approx-imated to the sample mean in (1) because we cannot obtain

. Substituting (1) into (11), we obtain

(12)

The left-hand side can be rewritten as, that is, (12) represents the likelihood of the

sample variance conditioned by NLF , if we obtain acertain sample mean . This likelihood represents how much


the sample variance is likely to the estimated populationvariance at the pixel and is used to estimate theparameters of NLFs in Section IV-C.

C. NLF Estimation and Forgery Detection

As mentioned in Section I, we use video recorded of astatic scene where the camera and the objects are fixed duringrecording. In such video clips, the temporal variation of eachpixel value results entirely from noise, not from motion ofobjects or a camera. Therefore, calculating the temporal meanand the variance of the pixel values at each pixel in the givenvideo, we obtain both the temporal sample mean in (8) andthe temporal sample variance in (9) at each pixel. Eachcalculated mean value is divided by 255 because the meanfunction and the linear basis functions are normalized. Inthe same way, the variance values are divided by the maximumvalue, i.e., normalized variances are in the range .

Obtaining a number of pairs of and by temporal aver-aging, we evaluate the posterior probability of an NLF for pa-rameter estimation. Note, however, that if forged objects are su-perimposed in a video, two NLFs (from the original and the su-perimposed video) coexist in a single video clip. Therefore, weintroduce a latent variable to formulate a mixture model oftwo NLFs and calculate the responsibility of the pixel for esti-mating each NLF. The latent variable representsthe source video from which the pixel comes. In this paper,we assume that the number of NLFs is known to be two. Whenthe number of NLFs is unknown, we could apply the approachfor estimating the number of clusters in the Gaussian mixturemodel. This is left for our future work.

The probability that is from the th source with NLF iscalculated as

(13)

where and are the sample mean and the sample varianceof pixel , respectively. We cannot decide the authentic sourceby analyzing the candidates of NLFs because we have no priorinformation about “what is authentic,” but we can indicate thesource most likely to be authentic is that the NLF of which isdominant for most of the pixels. Using (12), the likelihood ofthe th source is rewritten as

(14)

Since the parameters of the NLF should be known to calculatethe posterior and vice versa, we use the EM algorithm [20] tosimultaneously estimate the posteriors and the parameters of theNLF.

In the E-step, the likelihood that each pixel comes fromsource 1 with NLF is estimated by (14). The likelihood in

the case of source 2 with NLF is estimated as well. Then thepriors of NLFs are updated as

(15)

Substituting (14) and (15) to (13), we obtain the posterior prob-ability of each pixel.

In the M-step, we estimate the parameter of the NLFs viaMAP estimation. MAP estimation using a chi-square distri-bution results in a nonlinear optimization problem, where weshould consider high computation cost and local optima. Inorder to make the optimization problem faster and stable, weapproximate a chi-square distribution by a Gaussian distributionunder the assumption that the degree of freedom in a chi-squaredistribution is sufficiently large. Therefore, obtaining theMAP estimate results in solving the following least squaresproblem with additional terms. We optimize the parameters ofNLF , that is, in (7), so that they minimizethe following energy function proposed as:

(16)

where is the weight given by (13).The second and third terms on the right-hand side are derivedfrom prior probabilities of the parameters. The second term isthe smoothness term and is the balancing parameter. The thirdterm represents the penalty term and is the penalty function

ifotherwise

(17)

where is a large penalty value . Penalty term limitsthe estimated NLF from zero to one at any point.

Repeating E-step and M-step until the parameters converge,we estimated the probability of forgery at each pixel and theweight parameters of the linear basis functions for the NLF ofthe th source. We set the initial coefficient vectors of the NLFsas for the first NLF and

for the second NLF. Theinitial prior probabilities are set as 0.9 and 0.1 for each NLF,respectively, in the current implementation. We confirmed thatthe optimization is not sensitive to initial values through prelim-inary experiments.

V. EXPERIMENTS

We conducted experiments by applying our method as fol-lows to videos recorded in a laboratory and videos recorded in-doors and outdoors. As shown in Fig. 3, the noise characteristicsof the real video have different distributions with respect to theCRF. As an evaluation of the proposed method, forged videoswere synthesized from two videos with different CRFs, and weapplied our method to detect the forged regions. We used a colorcheckerboard as a target object at first, and next we used videosrecorded in normal indoor and outdoor environments.


Fig. 5. Example of the recorded video.

Fig. 6. Red squares indicate forged regions.

All the experiments were done on the videos recorded bya Point Grey Research Flea digital camera. The CRF of thiscamera can be configured in a nonparametric form. We preparedCRFs of the camera from the database used in Section IV-A.We also set the frame rate to 30 fps and the resolution to640 480 pixels. Captured videos are compressed by thelossless huffyuv codec, which yields more than 1 Mb/s of thebit rate.

A. Forgery Detection Based on NLF

First we conducted a forgery detection experiment in a labo-ratory. We chose the CRFs used in Section III from the databaseand recorded the Macbeth Color Checkerboard. An example ofthe recorded video is shown in Fig. 5. Shutter time and electricgain were set so that all videos have the similar mean brightnessin the sequence. Fig. 3 shows the noise characteristics calculatedfrom 285 frames of the recorded videos.

We created forged video clips from the video sources men-tioned above. At first a pair of videos recorded under differentparameters was chosen. We located 16 forged regions of100 100-dimension on 16 patches of the checker board. Eachforged region was labeled from (a) to (p) as shown in Fig. 6.The pixel values in the regions over all frames of one videowere overwritten by those in the other video. We created twoforged videos: we chose the video recorded with CRF No.27[Fig. 3(a)] as the original video, and videos recorded with CRFNo.53 [Fig. 3(b)] and No.164 [Fig. 3(c)] as the replaced videos.

The noise characteristics of the forged videos were calculatedas described in Section IV. Using the EM algorithm, we esti-mated the NLF parameters and the posterior of forgery. Estima-tion results are shown in Figs. 7 and 8. We consider the NLFof dominant pixels as the NLF of the original video (source1). On the other hand, the other NLF is considered to be theNLF of the forged region (source 2). The plotted points are col-ored with respect to the posterior probability that the pixel camefrom source 2, denoted in the colorbar in the right. In the caseshown Fig. 7(a), the plotted points are clearly separated intoeach source except region (a), (h), and (n). This is because two

Fig. 7. (a) Estimated NLFs of the forged video synthesized from the videosrecorded with CRF No.27 and No.53. (b) Estimation of forgery posterior prob-ability. Colorbar in (a) indicates the posterior probability that each noise char-acteristic point belongs to source 2.

Fig. 8. Estimation of the forged video synthesized from the videos recordedwith CRF No.27 and No.164.

Fig. 9. Mean posterior probabilities that the pixels came from each source withNLF in the forged regions and the background. The labels shown below thehorizontal axis correspond to the labels of the forged regions in Fig. 6. (a) CRFNo.27 and No.53. (b) CRF No.27 and No.164.

NLFs intersect one another at the mean pixel value in this region.Fig. 7(b) shows that all the forgeries were accurately detected.On the other hand, in the case shown in Fig. 8(a), two estimatedNLFs lie side by side. This overlap of NLFs results in less ac-curate estimation shown in Fig. 8(b), but even here most forgedregions are properly detected.

We show the accuracy evaluation of the proposed method inFig. 9. The posteriors at each pixel belonging to each NLF areaveraged in the forged region. Fig. 9 shows the mean posteriorsin 16 forged regions and the unforged background in each com-bination of the source videos. In the combination of CRF No.27and No.53, the posteriors in the all regions are correctly biasedto one side. In contrast, the posteriors in some of the regions aremisclassified for the combination of CRF No.27 and No.164.

The running time of this experiment using 64-bit MATLABon a 2.66-GHz Intel Core2 Quad processor is approximately27.5 s. Although the proposed method estimates forgeries thor-oughly without optimizing the implementation in this paper, we


Fig. 10. Forged videos recorded realistic scenes. (a) Forged video 1 (Wrappedbox on a bookshelf). (b) Forged video 2 (Bicycle in front of a building).

Fig. 11. Detection result of the video shown in Fig. 10(a).

can accelerate the computation speed by using parallelizationsince E-step computes the probability of each pixel indepen-dently. Moreover, we can reduce the computational costs bysubsampling the given video. In this case, there is a trade-offbetween speed and accuracy.

B. Application to Realistic Scenes

Experiments mentioned in the previous section were done ina laboratory. In this section, we created forged videos recordedmore realistic indoor and outdoor scenes, as shown in Fig. 10.The forged videos were created by cropping, resizing, andtuning brightness and contrast by using Adobe Premiere. Thevideo shown in Fig. 10(a) was a bookshelf with a wrapped boxsuperimposed on the left column of the second row from thebottom. Fig. 10(b) shows a building where a bicycle is super-imposed. The numbers of frames are 298 and 126, respectively.In these realistic scenes, some pixels are varying significantlylargely compared to their neighboring pixels. These pixelswith a high variance result from the motion of the objects atthose pixels and the change in the illumination. We regardedthese pixels as outliers and removed them by a threshold setmanually.

Running on the same system as the previous section, the pro-posed method requires approximately 20 s for computation. De-tection results of the videos are shown in Figs. 11 and 12. Asshown in Fig. 11(b), the wrapped box is accurately detected.On the other hand, the result of the outdoor scene shown inFig. 12(b) looks noisy in the glass and wall region. This is be-cause the distribution of the noise characteristics resulted fromthe forged and the unforged regions shown in Fig. 12(a) par-tially overlap. However, the posteriors of the pixels in the forgedarea are relatively higher than those in the background, whichis enough to differentiate them.

Fig. 12. Detection result of the video shown in Fig. 10(b).

TABLE IBIT RATE AND DETECTION QUALITY FOR VARIOUS CODECS

C. Effect of Video Compression

Finally, we refer to the effect of video compression to theforgery detection. We compressed the forged video that yieldsthe noise characteristics shown in Fig. 7(a) by using severalkinds of codecs. Both conventional and recently used codecsare chosen: MPEG-2, Cinepak, and H.264. For each codec weset the encoding parameters to achieve the best image qualityand choose a constant bit rate. Detection quality is evaluatedby two criteria: a true positive rate, which represents an averageposterior probability of forgery over all superimposed pixels in aframe, and a true negative rate, which represents an average pos-terior probability of authentic regions over background pixels ina frame.

Table I summarizes the bit rate and the detection quality withrespect to the codec, comparing the result of the video encodedby using lossless huffyuv codec. The table shows that the de-tection quality of the proposed method depends on the codec.This limitation results from two reasons. One is block noisecaused by conventional codecs that increase artifact noise. Theother is noise reduction caused by sophisticated codecs. Recentcodecs make full use of motion information in a video sequenceand reduce noise components effectively. Because the proposedmethod requires information about noise, video compressionlimits its performance.

VI. DISCUSSION

In this study, we developed a method to detect forgeries ina static video scene on the basis of inconsistencies in NLFs.Obtaining the noise characteristic at each pixel, i.e., per-pixelrelation of a temporal mean and a temporal variance, we cal-culate the posterior probability of the source video from whicheach pixel comes. Parameters of NLFs and the authenticity ofeach pixel are estimated by using the expectation maximiza-tion (EM) algorithm. In particular, we formulate the likelihoodof the calculated characteristics points by using the chi-squaredistribution in order to deal with finite sets of the pixel values.


Our method highly accurately estimates NLFs and the posteriorof forgery when two NLFs are well separated. Even in worseconditions, the method can detect some suspicious pixels in theforged region, which is useful to identify regions in the inputdata that may be tampered. We demonstrated the effectivenessof the proposed method both for the videos recorded in a labo-ratory and for more realistic scenes.

The following considerations will provide future work. First,our method currently deals with only the videos of static scenes,but the method should be further extended to work with peopleand moving objects, separated by using background subtraction.Second, the proposed method depends on the codec used forvideo compression because of the artifact noise and noise re-duction. To overcome this limitation, the noise characteristicsof the compressed videos should be considered in more detail.Since video compression itself is the source of noise, analyzingthe noise characteristics of the compressed videos can be usedas a fingerprint of the video. Finally, if postprocessing in forgerysuch as tuning brightness and contrast affects the noise charac-teristics heavily, the proposed method can fail to fit the NLF byusing the representation of NLF shown in (7). In such a case, theproposed method can be extended to deal with the operation byadding extra parameters that represent stretching of mean andvariance.

REFERENCES

[1] S.-J. Lee and S.-H. Jung, “A survey of watermarking techniques ap-plied to multimedia,” in Proc. IEEE Int. Symp. Industrial Electronics,2001, vol. 1, pp. 272–277.

[2] T. Van Lanh, K.-S. Chong, S. Emmanuel, and M. S. Kankanhalli, “Asurvey on digital camera image forensic methods,” in Proc. IEEE Int.Conf. Multimedia and Expo, 2007, pp. 16–19.

[3] M. K. Johnson and H. Farid, “Exposing digital forgeries by detectinginconsistencies in lighting,” in Proc. Workshop on Multimedia and Se-curity, 2005, pp. 1–10.

[4] M. Johnson and H. Farid, “Exposing digital forgeries through chro-matic aberration,” in Proc. Int. Multimedia Conf., 2006, pp. 48–55.

[5] Z. Lin, R. Wang, X. Tang, and H.-Y. Shum, “Detecting doctored imagesusing camera response normality and consistency,” in Proc. IEEE Com-puter Society Conf. Computer Vision and Pattern Recognition, 2005,vol. 1, pp. 1087–1092.

[6] J. Lukás, J. Fridrich, and M. Goljan, “Detecting digital image forgeriesusing sensor pattern noise,” in Proc. Society of Photo-Optical Instru-mentation Engineers Conf., 2006, vol. 6072, pp. 362–372.

[7] J. Fridrich, D. Soukal, and J. Lukás, “Detection of copy-move forgeryin digital images,” in Proc. Digital Forensic Research Workshop, Cleve-land, OH, 2003.

[8] S. Ye, Q. Sun, and E.-C. Chang, “Detecting digital image forgeriesby measuring inconsistencies of blocking artifact,” in Proc. IEEE Int.Conf. Multimedia and Expo, 2007, pp. 12–15.

[9] W. Wang and H. Farid, “Exposing digital forgeries in video by de-tecting duplication,” in Proc. Workshop on Multimedia & Security Int.Multimedia Conf., New York, NY, 2007, pp. 35–42.

[10] C. Liu, W. Freeman, R. Szeliski, and S. B. Kang, “Noise estimationfrom a single image,” in Proc. IEEE Computer Society Conf. ComputerVision and Pattern Recognition, 2006, vol. 1, pp. 901–908.

[11] W. Wang and H. Farid, “Exposing digital forgeries in interlaced anddeinterlaced video,” IEEE Trans. Inf. Forensics Security, vol. 2, no. 3,pp. 438–449, Sep. 2007.

[12] C.-C. Hsu, T.-Y. Hung, C.-W. Lin, and C.-T. Hsu, “Video forgery de-tection using correlation of noise residue,” in Proc. IEEE 10th Work-shop Multimedia Signal Processing, 2008, pp. 170–174.

[13] M. Kobayashi, T. Okabe, and Y. Sato, “Detecting video forgeries basedon noise characteristics,” in Proc. 3rd Pacific Rim Symp. Advances inImage and Video Technology, 2009, pp. 306–317.

[14] Y. Matsushita and S. Lin, “Radiometric calibration from noise distri-butions,” in Proc. IEEE Computer Society Conf. Computer Vision andPattern Recognition, 2007, pp. 1–8.

[15] J. Takamatsu, Y. Matsushita, and K. Ikeuchi, “Estimating radiometricresponse functions from image noise variance,” in Proc. Eur. Conf.Computer Vision, 2008, pp. 623–637.

[16] J. Takamatsu, Y. Matsushita, and K. Ikeuchi, “Estimating camera re-sponse functions using probabilistic intensity similarity,” in Proc. IEEEConf. Computer Vision and Pattern Recognition, Jun. 23–28, 2008, pp.1–8.

[17] Y. Matsushita and S. Lin, “A probabilistic intensity similarity measurebased on noise distributions,” in Proc. IEEE Conf. Computer Visionand Pattern Recognition, Jun. 17–22, 2007, pp. 1–8.

[18] G. Healey and R. Kondepudy, “Radiometric CCD camera calibrationand noise estimation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 16,no. 3, pp. 267–276, Mar. 1994.

[19] Y. Tsin, V. Ramesh, and T. Kanade, “Statistical calibration of CCDimaging process,” in Proc. IEEE Int. Conf. Computer Vision, 2001, vol.1, pp. 480–487.

[20] C. Bishop, Pattern Recognition and Machine Learning. New York:Springer, 2006.

Michihiro Kobayashi received the M.S. degree inengineering from Yokohama National University,Japan, in 2007, and the Ph.D. degree in informationscience and technology from the University ofTokyo, Japan, in 2010.

He joined the Institute of Industrial Science at theUniversity of Tokyo, where he is currently a projectresearcher. His research interests include noise anal-ysis, computer vision, and pattern recognition on thebasis of physical models.

Takahiro Okabe (M’03) received the B.S. degree inphysics from the School of Science, the University ofTokyo, Japan, in 1997, and the M.S. degree in physicsfrom the Graduate School of Science, the Universityof Tokyo, in 1999.

In 2001, he joined the Institute of IndustrialScience at the University of Tokyo, where he iscurrently a research associate. His primary researchinterests are in the fields of computer vision, imageprocessing, pattern recognition, and computergraphics, especially in their physical and mathemat-

ical aspects.

Yoichi Sato (M’99) received the B.S.E. degree fromthe University of Tokyo, Japan, in 1990, and theM.S. and Ph.D. degrees in robotics from the Schoolof Computer Science, Carnegie Mellon University,in 1993 and 1997, respectively.

He is a Professor at the Institute of Industrial Sci-ence, the University of Tokyo. His research interestsinclude physics-based vision, reflectance analysis,image-based modeling and rendering, tracking andgesture analysis, and computer vision for HCI.

ieee transactions on information … › ~ysato › papers › kobayashim-tifs...884 ieee...

Documents