prnu-based content forgery localization augmented with

Received November 15, 2020, accepted November 28, 2020, date of publication December 7, 2020,date of current version December 24, 2020.

Digital Object Identifier 10.1109/ACCESS.2020.3042780

PRNU-Based Content Forgery LocalizationAugmented With Image SegmentationXUFENG LIN AND CHANG-TSUN LI , (Senior Member, IEEE)School of Information Technology, Deakin University, Geelong Waurn Ponds Campus, Waurn Ponds, VIC 3216, Australia

Corresponding author: Xufeng Lin ([email protected])

This work was supported by the EU Horizon 2020 Project, entitled Computer Vision Enabled Multimedia Forensics and PeopleIdentification (Acronym: IDENTITY), under Project 690907.

ABSTRACT The advances in image editing and retouching technology have enabled an unskilled person toeasily produce visually indistinguishable forged images. To detect and localize such forgeries, many imageforensic tools rely on visually imperceptible clues, e.g. the subtle traces or artifacts introduced during imageacquisition and processing, and slide a regular, typically square, detection window across the image to searchfor discrepancies of specific clues. Such a sliding-window paradigm confines the explorable neighborhoodto a regular grid and inevitably limits its capability in localizing forgeries in a broad range of shapes. Whileimage segmentation that generates segments adhering to object boundaries might be a promising alternativeto the sliding window-based approach, it is generally believed that the potential of the segmentation-baseddetection scheme is hindered by object removal forgeries where meaningful segments are often unavailable.In this work, we take forgery localization based on photo-response non-uniformity (PRNU) noise as anexample and propose a segmentation-based forgery localization scheme that exploits the local homogeneityof visually indiscernible clues to mitigate the limitations of existing segmentation approaches that aremerely based on visually perceptible content. We further propose a multi-orientation localization schemethat integrates the forgery probabilities obtained with image segmentation and multi-orientated detectionwindows. The multi-orientation scheme aggregates the complementary strengths of image segmentation andmulti-oriented detection window in localizing the object insert and object removal forgeries. Experimentalresults on a public realistic tampering image dataset demonstrate that the proposed segmentation-basedand multi-orientation forgery localization schemes outperform existing state-of-the-art PRNU-based forgerylocalizers in terms of both region and boundary F1 scores.

INDEX TERMS Digital image forensics, image forgery localization, image segmentation, multimediasecurity, multi-orientation detection, photo-response non-uniformity noise.

I. INTRODUCTIONThe ease of manipulating images with increasingly powerfulimage editing tools has also led to the growing appearance ofdigitally altered forgeries, which raise serious concerns aboutthe credibility of digital images. Developing image forgerydetection and localization techniques, therefore, becomesessential under the circumstances where digital images serveas critical evidence. A variety of approaches have been pro-posed to detect and localize image forgeries. Active tech-niques such as digital watermarking [1], [2] are effective inverifying the authenticity of an image, but the requirement of

The associate editor coordinating the review of this manuscript and

approving it for publication was Li He .

embedding additional information at the creation of the imagelimits their widespread use. On the contrary, passive or blindtechniques rely on the image data itself without requiringany pre-embedded information and thus have attracted greatinterests over the past two decades. The passive image forgerydetection and localization techniques in the literature mainlyfall into five categories:

(1) The first category is based on specific statistical proper-ties of natural images such as higher-order wavelet statis-tics [3], [4], image texture features [5] and residual-basedfeatures [6], [7].

(2) The second category includes the techniques that seekthe traces left by specific image manipulations such asresampling [8], [9], contrast enhancement [10], [11],

VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ 222645

https://orcid.org/0000-0002-3400-8700

https://orcid.org/0000-0003-4735-6138

https://orcid.org/0000-0003-0261-4068

X. Lin, C.-T. Li: PRNU-Based Content Forgery Localization Augmented With Image Segmentation

median filtering [12], copy-move manipulations [13] andJPEG compression [14]–[16].

(3) The third category relies on the regularities or constraintsthat make images physically plausible. For instance,anomalies in lighting [17], shadows [18], [19], reflections[20] and perspective constraints [21] have been exploitedfor exposing image manipulations.

(4) The fourth category exploits the artifacts introduced inthe image acquisition pipeline of digital camera. Theseartifacts can be either caused by specific processingcomponents such as demosaicking artifacts [22], cam-era response function abnormalities [23], [24], opticalaberrations [25] and imaging sensor noise [26]–[28],or caused by the complex interplay of multiple compo-nents, such as local noise levels [29].

(5) Inspired by the success of convolutional neural net-works (CNN) in various multimedia tasks, the fifth cat-egory comprises the techniques that automatically learnthe features for image forgery detection or localizationusing CNN [30]–[32].

Besides the above-mentioned techniques, recent years haveseen some works on fusing the outputs of multiple foren-sic detectors, e.g. under the framework of fuzzy logic [33],the Dempster-Shafer theory of evidence (DSTE) [34], simplelogical rules [35], [36] or Bayesian inference [37], but theperformance of the overall system depends on that of eachindividual forensic detector. To localize the forgeries in animage, a prevailing paradigm is to move a regular, typicallysquare, detection window across the image and analyze theforensic clues in the detection window at different locations.Such a sliding-window paradigm allows for the convenientand efficient processing of the image but also has two inherentlimitations: 1) The regular shape of the detection window hin-ders the detection of forgeries in various possible shapes. Forinstance, a square detection window is not suitable for detect-ing tree branches or human limbs. 2) Due to the object bound-ary unawareness of the sliding-window detection scheme,pixels of different natures, e.g. some pixels are forged andothers are pristine, are often contained in the same window.Jointly processing those ‘heterogeneous’ pixels without anydistinction makes the detection highly error-prone.

The above limitations can be alleviated by image segmen-tation. Local structure of visual content is typically exploitedby segmentation algorithms to divide an image into a numberof ‘segments’ or ‘superpixels’, each of which consists of agroup of connected pixels that are perceptually or statisticallysimilar. The resultant superpixels provide informative cluesfor inferring the boundaries of the forged regions. However,given the variability of possible forgeries, the limitations ofimage segmentation are also apparent: 1) There might be nodistinct local structure information in homogeneous regionsto guide the segmentation. 2) The boundaries of the forgedregions do not always align with the object boundaries. Forinstance, it is not uncommon to copy some collateral back-ground pixels along with the forged object to make it blendedinto the background more naturally. In such cases, pixels of

different natures, i.e. the collateral forged pixels and theirneighboring authentic pixels, might be grouped into the samesuperpixel and cause further uncertainties.

In this article, we take forgery localization based onphoto-response non-uniformity (PRNU) noise as an exampleand show that it is possible to mitigate the limitations ofsegmentation merely based on visual content by incorporat-ing the local homogeneity of visually imperceptible clues.The main contributions of our work can be summarized asfollows.

• We identify the essential criteria that a superpixel seg-mentation algorithm should satisfy for the task of forgerylocalization and propose a multi-scale segmentation-basedlocalization scheme in cooperation with some segmenta-tion algorithm that fulfills the identified criteria.

• We propose a novel superpixel segmentation algorithmthat encodes both visual content and visually imperceptibleclues for content forgery localization. Different from allprevious works that only use visual boundary informationfor image segmentation, the proposed algorithm incorpo-rates the information from visually imperceptible forensicclues to guide the segmentation for forgery localization.

• We propose a multi-orientation fusion strategy that aggre-gates the complementary strengths of image segmentationand multi-oriented detection in localizing various types ofimage forgeries.

• We conduct comprehensive experiments on the effects ofvarious segmentation algorithms on different forgery types,i.e. hard-boundary and soft-boundary forgeries, in terms ofboth region and boundary F1 scores.

The rest of this article is organized as follows. In Section II,we will revisit the background of PRNU-based forgery local-ization and some related work. In Section III, the details of theproposed segmentation-based and multi-orientation forgerylocalization schemes will be given. Section IV presentsthe experimental results and analysis on realistic forgeries.Finally, Section V concludes the work.

II. RELATED WORKIn this section, we will first revisit the background of thePRNU-based forgery localization. PRNU noise mainly arisesfrom the inhomogeneity of silicon wafers introduced duringimaging sensor fabrication and manifests itself as the pixel-to-pixel variation in light sensitivity. It is present in everyimage and practically unique to the sensor that captures theimage. Therefore, its absence can be used to signify theforgery in an image under investigation, provided that thereference PRNU z of the source camera is available for com-parison. For this purpose, a binary hypothesis test is typicallycarried out for each pixel i in the image’s noise residual ω,i.e. the difference between the original image and its denoisedversion: {

h0 : ωi = υih1 : ωi = zi + υi

(1)

222646 VOLUME 8, 2020


where υi is Gaussian random noise. If pixel i is forged, ziis supposed to be absent in ωi, which corresponds to thenull hypothesis h0. Otherwise, the alternative hypothesis h1is accepted.

Due to the weak nature of z, the above hypothesis testingusually requires joint processing of a large amount of pixels,so it is typically conducted in a sliding-window fashion bycalculating the normalized cross-correlation ρi over the pixelswithin a w×w detection window �i centered at each pixel i:

ρi =

∑j∈�i (ωj − ω)(zj − z)√∑

j∈�i (ωj − ω)2√∑

j∈�i (zj − z)2, (2)

where the overbar stands for the respective arithmetic meanwithin the detection window. Provided that the distributionsof the test statistic ρi under h0 and h1, i.e. p(ρi|h0) andp(ρi|h1), are available, either Neyman-Person [26] criterionor Bayes’ rule [27], [28] can be used to make the finaldecision on the authenticity of pixel i. We will adopt theBayes’ rule as in [27], [28] for the easier formulation ofmulti-clue fusion under a uniform probability framework.We model p(ρi|h0) as a zero-mean Gaussian distributionN (0, σ 2

0 ) and estimate the variance using the images capturedby the cameras that are different from the source camera.While p(ρi|h1) can also be modeled as a Gaussian distribu-tion N (ρi, σ 2

1 ), its estimation is more challenging since theexpected correlation ρi is highly dependent on the imagecontent. To address this problem, we adopt the correlationpredictor [26] based on local image features, namely theimage intensity, texture, signal flattening, texture-intensityfeatures and their second-order terms, to estimate the mean ρiand variance σ 2

1 of p(ρi|h1). Finally, the forgery probabilityof pixel i is formulated as

Pi =p(ρi|h0)

p(ρi|h0)+ p(ρi|h1)

=

(1+ exp

(ρ2i

2σ 20

−

(ρi − ρi

)22σ 2

1

− lnσ1

σ0

))−1. (3)

In the above sliding window-based framework, the teststatistic ρi is calculated over all the pixels within the detectionwindow. It becomes problematic if the detection windowfalls across the boundary between the pristine and the forgedregions. In such case, both the pristine and forged pixelscontribute to the calculation of ρi, which gives rise to thechance of the pixels near the boundaries being undetected.This problem can be alleviated by means of hard segmen-tation, as shown in Chierchia et al.’s work [38], but its relyon accurate manual segmentation makes it infeasible in mostpractical scenarios. They further proposed a method [39]based on guided filtering [40] to enforce the calculation ofρi more aligned with the boundaries of objects. However,its effectiveness relies on the visually perceptible bound-ary information of the image and thus is largely compro-mised for object removal forgeries where no meaningfulboundary information is available. In light of the fact that

meaningful forgeries usually appear in a group of connectedpixels, another direction of improvement lies in taking intoaccount pixels’ spatial dependencies in the final decisions.For instance, Chierchia et al. [27] sought to improve thelocalization performance by modeling the decision statisticas a Markov Random Field (MRF) and recasting the forgerylocalization as an optimization problem under the Bayesianframework. This method only considers the neighborhoodconsistency of the final decisions but ignores the contentsimilarity within the neighborhood.

To allow for localizing small-sized forgeries, Korus andHuang [28] proposed a segmentation-based strategy that cal-culates the test statistic ρi only using the pixels with similarintensity levels as the central pixel i, as well as two fusionstrategies, i.e. multi-scale and adaptive-window, that com-bine multiple candidate forgery probability maps obtainedwith detection windows of different sizes. Additionally, theyadopted the Conditional Random Fields (CRF) model toincorporate the content-dependent neighborhood interactionsinto the final decisions. Although only the intensity differencebetween individual pixels is considered, the incorporationof image content in the simple segmentation-based strategyand the CRF-based model results in significant performanceimprovement [16], [28]. Further along this path, it poses aninteresting question that whether further improvement can begained if the image content is exploited in a more sophis-ticated and comprehensive way. This motivates us to resortto explicit image segmentation that fully exploits the localstructure and homogeneity of the image for improving theforgery localization performance.

III. PROPOSED METHODSA. SEGMENTATION-BASED FORGERYLOCALIZATION SCHEMEFor PRNU-based forgery localization, a straightforward wayto exploit image content would be segmenting the imageinto a number of superpixels according to the visual contentand calculating the forgery probability for each superpixel.However, compared to the methods based on regular detec-tion windows that are able to generate pixel-wise probabilitymaps, this will result in a low-resolution forgery probabilitymap because the pixels belonging to the same superpixel areassigned with the same probability. Consequently, the chanceof mis-detection for object removal forgeries will be consid-erably increased if the size of the superpixel is not appropri-ately specified. For this reason, we apply image segmentationat different scales and fuse the resultant forgery probabil-ity maps to form a single informative probability map. Theframework of the proposed segmentation-based multi-scalelocalization scheme is illustrated in Fig. 1.

1) MULTI-SCALE IMAGE SEGMENTATIONGiven the variety of superpixel segmentation algorithms,not all of them suffice for our purpose. We identify a few

VOLUME 8, 2020 222647


important criteria that a superpixel segmentation algorithmshould satisfy for the task of forgery localization:

• Boundary adherence. This criterion measures the agree-ment between the boundaries of the objects and theresultant superpixels. A superpixel algorithm with goodboundary adherence effectively avoids segmenting differ-ent objects or different parts of an object into the samesuperpixel, thus reducing the risk of generating hetero-geneous superpixels containing both pristine and forgedpixels for object insert forgeries.

• Controllability over superpixel number. Some algorithmsdo not provide direct control over the number of generatedsuperpixels. As the segmentation needs to be carried out atdifferent scales, easy control over the number of generatedsuperpixels is preferable.

• Balanced segmentation granularity. Some superpixel algo-rithms generate superpixels of heavily unbalanced size.The over-sized superpixels substantially increase thechance of generating heterogeneous superpixels while theunder-sized superpixels reduce the reliability of the detec-tion. Thus, an algorithm capable of generating superpixelsof balanced size is desirable.

Based on these criteria, we shortlist three superpixel algo-rithms for multi-scale segmentation:

• Simple Linear Iterative Clustering (SLIC) [41]: SLIC algo-rithm iteratively updates superpixel centers and assignspixels to their closest centers in the 5-dimensional pixelcolor and coordinate space until the algorithm converges.Its simplicity, computational efficiency and high-qualityoverall segmentation results make it one of the most widelyused superpixel algorithms in various applications.

• Entropy Rate Superpixel segmentation (ERS) [42]: ERSalgorithm considers each pixel as a vertex in a graph andformulates the segmentation as an optimization problemof graph topology. It incrementally optimizes an objectivefunction consisting of the entropy rate of random walkson the graph and a balancing term to achieve homoge-neous and similar-sized superpixels. ERS exhibits remark-able boundary adherence, which is desirable for localizingobject insert forgeries.

• Extended Topology Preserving Segmentation (ETPS) [43],[44]: ETPS algorithm initially partitions the image into thedesired number of regular superpixels and continuouslymodifies the boundary pixels in a coarse-to-fine manner byoptimizing an objective function that encodes informationabout colors, positions, boundaries and topologies. Thediversity of information encoded in the objective functionand the efficient coarse-to-fine optimization make ETPSone of the state-of-the-art superpixel algorithms in termsof both segmentation quality and efficiency.

For more details about the above superpixel algorithms,we refer the reader to the original papers. Note that themulti-scale framework in Fig. 1 is similar to the work ofZhang et al. [45]. However, their analysis is only limited to the

FIGURE 1. The segmentation-based forgery localization scheme. For moredetails about the Adaptive Window (AW) and Multi-Scale (MS) fusionalgorithms, please refer to Section IV-C or [28].

SLIC algorithm and falls short in the impact of segmentationon different types of image forgeries.

2) EXPLOITING HOMOGENEITY OF PRNUAs mentioned in Section II, only relying on the visual contentfor image segmentation might become problematic for objectremoval forgeries where no distinct visual information isavailable. Although this problem can be mitigated by themulti-scale strategy, it would be beneficial if the additionalinformation from PRNU can be incorporated to guide thesegmentation at each scale. The perceptually invisible PRNUnot only provides useful clueswhen salient visual informationis unavailable but also serves as a supplement to the visualinformation to eliminate the ambiguity in regions containingcomplex patterns or structures. In what follows, we willdescribe how the homogeneity of PRNU can be integratedto guide the segmentation.

Let N be the number of pixels in the image and K bethe desired number of superpixels. The partitioning of theimage into superpixels can be represented by a set of randomvariables s = (s1, . . . , sN ), where si ∈ {1, ..,K } denotes thesuperpixel to which pixel i belongs. Following the ETPS algo-rithm in [43], [44], we formulate the image segmentation asan optimization problem with the following energy function

E(s,µ, c,P) =∑i

Ecol(si, csi )+ λpos∑i

Epos(si, µsi )

+ λb∑i

∑j∈N8(i)

Eb(si, sj)+Etopo(s)+ Esize(s)

+ λetp∑i

Eetp(si,Psi ), (4)

where c=(c1, . . . , cK ), µ=(µ1, . . . , µK ), and P=(P1, . . . ,PK ) are respectively the set of the mean colors, the meanpositions, and the forgery probabilities forK superpixels. Thedetails of different energy terms in Eq. (4) are given below.

222648 VOLUME 8, 2020


FIGURE 2. Demonstration of macropixel division for the coarse-to-fine optimization. The regions enclosed by red lines aresuperpixels and the regular grids bounded by blue dotted lines are macropixels.

Appearance Coherence:

Ecol(si, csi ) = ||I(i)− csi ||22. (5)

This term measures the squared Euclidean distance in theCIELAB color space between pixel i and the mean colorof the superpixel that i belongs to. It encourages the colorhomogeneity within superpixels.Shape Regularity:

Epos(si, µsi ) = ||L(i)− µsi ||22. (6)

It measures the coordinate distance between pixel i and themean position of the superpixel that i belongs to. This termencourages compact superpixels of regular shapes.Boundary Length:

Eb(si, sj) ={1, if si 6= sj0, otherwise

(7)

This term measures the length of borders between superpix-els. Irregular boundaries normally have a length longer thanregular boundaries, thus this term penalizes local irregulari-ties in the superpixel boundaries. Note that encouraging theshape and boundary regularity of the generated superpixelsdoes not contradict with our aim to localize forgeries ofvarious shapes because forged objects or regions rarely havewiggly shapes and boundaries.Topology Preservation: The term Etopo(s) is used to enforcethat each superpixel is composed of a single connected com-ponent without holes. This is done by checking if changingthe assignment of a boundary pixel will violate connectivity.Minimal Size: The term Esize(s) returns a value of∞ if thesize of each resultant superpixel is less than 1/4 of its initialsize. It encourages superpixels with similar sizes.PRNU Homogeneity:

Eetp(si,Psi ) =(255 · H (Psi )

)2, (8)

where Psi is the forgery probability of superpixel si andH (Psi ) = −Psi log2 Psi − (1 − Psi ) log2(1 − Psi ) is theentropy of a Bernoulli process characterized by Psi . If asuperpixel contains both pristine and forged pixels, the esti-mated forgery probability Psi will tend to be closer to 0.5,

where H (Psi ) attains its maximum value. Thus, H (Psi ) is anindicator of the homogeneity of PRNU within a superpixel.Actually, a similar measurement has been used in [28], [46]to assess the confidence or reliability of the estimated forgeryprobability. The factor 255 is used to promote Eetp(si,Psi ) tothe comparable level as the color energy term Ecol(si, csi ).Hereafter, we will refer to this algorithm as entropy-guidedETPS (EG-ETPS) algorithm.

We adopt the coarse-to-fine optimization framework in[43] to minimize the objective function in Eq. (4). As shownin Algorithm 1, the algorithm starts by equally partition-ing the image into K non-overlapping square grids of sizeb√N/Kc×b

√N/Kc px, where N is the total number of

pixels and b·c is the floor function. Each grid is consideredas an initial superpixel and further divided into a number ofmacropixels of size M×M pixels. M is initially set to 64but will be reset to b

√N/Kc/2 if b

√N/Kc < 64 to allow

for macropixel division at the beginning of the algorithm.As demonstrated in Fig. 2, a macropixel is a square imageblock that will be jointly considered to evaluate the objectiveenergy function, i.e. what pixel i is referred to in Eq. (4), andat the finest level, each macropixel only consists of one pixel.Initially, the statistics of mean color µ, mean position c andforgery probability P are computed for each superpixel basedon the corresponding statistics of the macropixels comprisingthe superpixel. Then, the following coarse-to-fine optimiza-tion iteratively updates the segmentation by proposing smalllocal changes at boundaries.

First, all the boundary macropixels, i.e. those with at leastone adjacent macropixel belonging to a different superpixel,are put into a FIFO priority list and popped out one by oneto check if changing the label of the popped macropixel iwill violate connectivity. If the label change is admissible andresults in an energy decrease, we assign macropixel i to oneof its neighboring superpixels that gives the lowest energy.Meanwhile, we incrementally update the mean colors, meanpositions and forgery probabilities of the two involved super-pixels (i.e. the superpixel that macropixel i belonged to andthe superpixel that macropixel i is newly assigned to) andappend the boundary macropixels adjacent to i at the endof the list. This process is repeated until the list is empty,

VOLUME 8, 2020 222649


Algorithm 1 Entropy-Guided ETPS AlgorithmInput:

I: image to be segmented;r: camera reference PRNU;n: noise residual extracted from I;I , T : intensity and texture feature maps;S, ϒ: signal-flattening and texture-intensity featuremaps;9: camera parameters incl. predictors under h0 and h1;λ: weighting factors in the objective energy function;K : the desired number of superpixels;

Output:s: resultant segmentation of the image;

1. Partition I into K non-overlapping square superpixels;2. Partition each superpixel into regular macropixels of size

64×64 and set current maximal macropixel sizem = 64;3. Calculate the initial energy, Eq. (4), for each superpixel;4. do5. if m < 16 then6. set λetp = 07. end if8. Initialize a FIFO list with all boundary macropixels;9. while list is not empty do10. Pop out boundary macropixel i from list;11. if invalid_connectivity(i) then12. continue13. end if14. si = argminsi E(s,µ, c,P)15. if si is updated then16. Incrementally update µ, c and P for the17. two superpixels involved;18. Append any boundary macropixel j in the19. 4-connected neighborhood of i to the list;20. end if21. end while22. if m > 1 then23. Bisect or quadrisect any divisible macropixel;24. Update maximal macropixel size m← dm2 e;25. else26. break27. end if28. while

at which point the optimizationwill proceed to next finer levelby bisecting or quadrisecting each macropixel to a smallersize.

Unlike other terms in Eq. (4), the PRNU homogeneity termcan only be reliably evaluated when the size of macropixel issufficiently large due to the weak nature of PRNU. The abovecoarse-to-fine framework makes it possible to reliably inte-grate the PRNU homogeneity information at coarse levels.We set λetp = 0 when the maximal macropixel size m<16when the information from PRNU is no longer reliable. One

of the key steps in Algorithm 1 is the incremental updateof the mean colors, mean positions and forgery probabilityfor calculating the energy value for each superpixel. Justas other terms, the PRNU homogeneity term can also beefficiently updated in an incremental manner as the assign-ment of a macropixel changes. The reader is referred to theAppendix for more details.

To see how the PRNU homogeneity term affects the seg-mentation, we show an example of object removal forgeryin Fig. 3. We use the ratio of the average forgery probabilityPf in the forged regions to the average forgery probabilityPp in the pristine regions as the indicator to differentiate theforged regions from the pristine ones. For the image of size1920×1080 px, we vary the superpixel number K from 1000to 3000 and show the result in Fig. 3f.We can observe a higherratio for EG-ETPS when the superpixel number K < 2200,which indicates an easier separation of the forged and thepristine regions. The forgery probability maps when K =1800 are shown in Fig. 3e. It can be observed that EG-ETPSoutputs a more homogeneous and coherent probability map.

FIGURE 3. The effect of PRNU homogeneity term, i.e. Eq. (8), on thesegmentation results. For ETPS and EG-ETPS, we used the sameparameters except for λetp, which is set to 2 for the results shown above.

B. MULTI-ORIENTATION FORGERY LOCALIZATIONSCHEME1) MULTI-ORIENTATION FORGERY DETECTIONMost existing image forgery detectors apply square detec-tion windows. Such an isotropic detection scheme inher-ently limits the capability to detect arbitrary-shaped andarbitrary-oriented forgeries. For instance, subtle forgeriessuch as human body limbs or tree branches might be unde-tectable with a square detection window. Although this issuecan be mitigated by the use of detection windows of smallersize, the reliability will also be compromised at smallerscales. To allow for more accurate forgery localization,the various shapes and orientations of the forged regions needto be taken into consideration in the design of the localizationframework. Inspired by the great success of faster R-CNN[47] in detecting objects based on anchor boxes, which are aset of predefined bounding boxes of certain scales and aspectratios, we extend the multi-scale forgery localization scheme

222650 VOLUME 8, 2020


in [28] by adopting detectionwindows of various aspect ratiosand orientations at each scale.

Based on the multi-scale framework in [28], we replace thesquare detection window at each scale with detection win-dows of multiple aspect ratios and orientations, as illustratedin Fig. 4. To reduce the computation, we only use 5 scales,i.e. w∈{32, 48, 64, 96, 128}, rather than the 7 scales in [28].For each scale, we use detection windows of 3 aspect ratios(1:1, 1:2 and 1:3) and rotate the window with a specificaspect ratio by a specific set of orientations. Specifically,we consider 11 orientations: 1 orientation {0} for aspect ratio1:1, 4 orientations {0, π4 ,

π2 ,

3π4 } for aspect ratio 1:2, and

6 orientations {0, π6 ,π3 ,

π2 ,

2π3 ,

5π6 } for aspect ratio 1:3. Such

configuration ensures that each window does not have toomuch overlap with its rotated versions. Note that the numbersof pixels within the detection windows at the same scale areapproximately the same. Taking the scale w=32 as an exam-ple, the sizes of detection windows of the 3 aspect ratios areconfigured to 32×32, 22×44 and 18×54 px. For this reason,the offline-trained correlation predictor at one scale can beused to predict the intra-camera correlations, i.e. p(ρi|h1) inEq. (3), for detection windows of different orientations at thesame scale.

FIGURE 4. Detection windows used in our multi-orientation forgerylocalization scheme.

2) MULTI-ORIENTATION FUSIONHaving obtained 11 candidate forgery probability maps ateach scale (corresponding to each of the 11 multi-orienteddetection windows), we need a fusion scheme to form asingle informative probability map. At first thought, a simplepixel-wise maximum fusion rule, i.e. selecting the largestvalue of the candidate probabilities for each pixel, will sufficefor the task since we aim to detect any possible forgery,but this will also introduce substantial false positives, i.e.mis-labeling pristine pixels as forged. Ideally, the best detec-tion result can be obtained if the detectionwindow is perfectlyaligned with the forged region. Thus, it is reasonable to acceptthe forgery probability calculated with the detection windowthat agrees best with the segmentation result. Suppose Pb, b ∈{1, . . . , 11} are the candidate forgery probabilities obtainedwith 11 detection windows centered at pixel i. We adopt thefollowing fusion strategy for pixel i:

Pi = rPb? + (1− r)Psi , (9)

where Psi is the forgery probability of the superpixel si thatpixel i belongs to, b? is the index of the detection window thathas the best agreement with si, i.e.

b? = argmaxb∈{1,...,11}

Wb ∩ si. (10)

r is the proportion of the detection windowWb? that overlapswith si. It measures how much of the probability obtainedwith detection window Wb? contributes to the final result.r = 1 when the detection window Wb? completely fallswithin a superpixel, which means we only rely on the resultobtained with one of the detection windows and is similarto the traditional detector merely based on sliding windows.r < 1 when the detection window Wb? falls across two ormore superpixels and (1−r)Psi will be used to compensate forthe pixels falling outside of the superpixel that pixel i belongsto. Note that the above fusion is performed at each scalebased on the superpixel segmentation and forgery probabilitymaps obtained with 11 detection windows. We introduce aparameter ξ to control the average size of superpixels relativeto the size of detection window at each scale. Specifically,if the size of the detection window is w×w px, the number ofsuperpixels K is set to

K = bNξw2 e, (11)

where N is the total number of pixels in the image and b·e isthe rounding operator. To see how the fusion strategy affectsthe final forgery probability when r < 1, we analyze thefollowing simplified scenarios as shown in Fig. 5:

FIGURE 5. Three simplified scenarios for multi-orientation forgeryprobability fusion. The pixel of interest is highlighted in yellow anddifferent color appearances are highlighted in different colors.

• Scenario I: Two forged regions with different color appear-ances are segmented into two different superpixels si andsj, which often occurs inside a forged region. Suppose theforgery probability obtained with the detection windowWb? is Pb? and the forgery probabilities for si and sj arePsi and Psj , respectively. The pixels considered in thisscenario are all forged pixels, so we can simply assumethat Psi ≈ Psj ≈ Pb? since the correlation predictor hasbeen designed to account for different color appearances.Therefore, the final forgery probability Pi ≈ Pb? , whichmeans that the fusion is equivalent to selecting the prob-ability corresponding to the detection window that agreesbest with the segmentation results.

• Scenario II: Two neighboring regions, with one forgedand the other pristine, have different color appearancesand are segmented into two superpixels si and sj. Thiscase usually occurs for object insert forgeries, where theinserted object usually has a different color appearancefrom the background. In such case, the pixels within thedetection window falling outside the forged region lead toan attenuated Pb? . The above fusion strategy compensates

VOLUME 8, 2020 222651


the attenuated Pb? by adding a term (1 − r)Psi . As Psi iscalculated over forged pixels and is expected to be > Pb? ,it results in a Pi > Pb? compensating for the attenuationcaused by the pixels falling outside the forged region.

• Scenario III: In this scenario, the forged and pristine regionshave the same or similar color appearance, which oftenoccurs in the case of object removal forgery. Due to thelack of distinguishable color appearance, some parts ofthe two regions are quite likely to be segmented into thesame superpixel. For instance, the segmentation may endup with two superpixels si and sj separated by the red line.If we assume that si and window Wb? are equally possibleto contain heterogeneous pixels, this scenario is similar tothe detection merely based on regular detection windows.In practice, because most superpixel algorithms will try toutilize as much local information as possible to performthe segmentation, the above fusion is expected to delivercomparable or even better performance than the methodsmerely based on regular detection windows.

Finally, we apply the multi-scale fusion strategies as pro-posed in [28] for the fused forgery probability mapsobtained at different scales. The framework of our proposedmulti-orientation forgery localization scheme is demon-strated in Fig. 6.

FIGURE 6. The multi-orientation forgery localization scheme. For moredetails about the Adaptive Window (AW) and Multi-Scale (MS) fusionalgorithms, please refer to Section IV-C or [28].

IV. EXPERIMENTSA. DATASETSOur experiments were conducted on the realistic image tam-pering dataset (RTD) [28], [48], which contains 220 realisticforgeries captured by 4 different cameras: Canon 60D, NikonD90, Nikon D7000, Sony α57 (each camera responsiblefor 55 forgery images). The images in the RTD dataset are1920×1080 px RGB images stored in the TIFF format andcover various types of forgeries including object insert andobject removal forgeries. The RTD dataset provides a goodbenchmark to evaluate the performance of camera-based

content authentication techniques. Considering the nature ofour proposed methods, it is reasonable to conduct evaluationsseparately on the object insert and object removal forgeries.However, the wide variety of forgeries in the RTD datasetmakes it hard to simply divide them into two categories.Therefore, we divided the RTD dataset into three subsets:(i) Hard-boundary Forgeries (HBF): This subset contains

the forged images with visually distinguishable bound-aries between the pristine and the forged regions.It mainly consists of the forgeries created by objectinsert, object replacement, color altering, etc.

(ii) Soft-boundary Forgeries (SBF): This subset contains theforged images with visually smooth and indistinguish-able boundaries between the pristine and the forgedregions. It mainly consists of the forgeries created bybackground texture synthesis or content-aware filling.

(iii) Mixed-boundary Forgeries (MBF): This subset containsthe images with both hard and soft boundaries betweenthe pristine and the forged regions.

The dataset division was done by visually inspecting eachimage in the RTD dataset. This results in 100, 80 and40 images in the HBF, SBF and MBF subsets, respec-tively. Some examples in these three subsets are shownin Fig. 7. Considering the relatively small size of the MBFsubset, we merged MBF with both HBF and SBF to createtwo subsets, i.e. HBF+MBF (140 images) and SBF+MBF(120 images), for evaluating the localization performance onforgeries of two boundary types.

FIGURE 7. Examples of forged images in hard-boundary forgeries (HBF),soft-boundary forgeries (SBF) and mixed-boundary forgeries (MBF).

B. EVALUATION PROTOCOLSThe localization performance is commonly evaluated by theregion F1 score. For an image Ii, 1 ≤ i ≤ S, let Gi be itsbinarized ground truth forgery map and Li(τ ) be the binaryforgery map output by a forgery localization algorithm witha threshold τ . The region F1 score is defined as

Fr (Gi,Li(τ )) =2 · Pr (Gi,Li(τ )) ·Rr (Gi,Li(τ ))Pr (Gi,Li(τ ))+Rr (Gi,Li(τ ))

, (12)

where Pr (Gi,Li(τ )) and Rr (Gi,Li(τ )) are the precision (thefraction of correctly detected forgery pixels) and the recall

222652 VOLUME 8, 2020


(the fraction of ground-truth forgery pixels detected), respec-tively. Like in [28], we measured the overall performancewith a curve of average Fr (τ ) score (each point in the curveis the Fr score averaged over S images for a given τ ) and anaverage peak Fr score (i.e. the average of the highest achiev-able Fr score over all possible thresholds for S images):

Fr (τ ) =1S∑S

i=1Fr (Gi,Li(τ ))

Fr =1S∑S

i=1maxτ∈(0,1)

Fr (Gi,Li(τ )).(13)

Fr score measures the localization performance by consid-ering the overlapping area (in terms of pixel counts) betweenGi and Li(τ ). However, as pointed out in [48], the region F1score is insufficient to accurately assess the contour adher-ence between the ground-truth and localized forgery regions.In fact, the contour conveys key information about the objectshape and provides inferable clues to uncover the potentialforged object, which is particularly desirable in the scenarioof soft-boundary forgeries. We thus also evaluated the local-ization performance with boundary F1 score [49], which hasbeen widely used to assess the boundary adherence for imagesegmentation. For an image Ii, 1 ≤ i ≤ S, let Bi be thecontour map of the binarized ground-truth forgery map andDi(τ ) be the contour map of the binary forgery map outputby a forgery localization algorithm given a threshold τ . Theboundary F1 score is defined as

Fb (Bi,Di(τ )) =2 · Pb (Bi,Di(τ )) ·Rb (Bi,Di(τ ))Pb (Bi,Di(τ ))+Rb (Bi,Di(τ ))

, (14)

where Pb(Bi,Di(τ )) andRb(Bi,Di(τ )) are the boundary pre-cision and the boundary recall given a distance tolerance θ :

Pb(Bi,Di(τ )) =1

|Di(τ )|

∑z∈Di(τ )

Jd(z,Bi) < θK

Rb(Bi,Di(τ )) =1|Bi|

∑z∈Bi

Jd(z,Di(τ )) < θK.(15)

Here, JzK is the Iversons bracket notation, i.e. JzK = 1 ifz = true and 0 otherwise, d(·) is the Euclidean distance,and θ determines whether a boundary point has a match ornot and is set to 0.75% of the image diagonal throughout ourevaluation. Similarly to the region F1 score, we evaluate theoverall boundary quality with a curve of average Fb(τ ) scoreand an average peak Fb score:

Fb(τ ) =1S∑S

i=1 Fb(Bi,Di(τ ))

Fb =1S∑S

i=1 maxτ∈(0,1)

Fb(Bi,Di(τ )).(16)

C. EVALUATED ALGORITHMS AND PARAMETER SETTINGSWe considered the following single-orientation forgery local-ization algorithms for the comparison with our proposedschemes:• Simple Thresholding (ST) based algorithm [26]: For ST,we first generated a binarized forgery map by comparingthe forgery probability map obtained by a detection win-dow of 128×128 px with a threshold τ ∈ [0, 1]. Then we

removed the connected forged regions with pixels fewerthan 64×64 px and applied image dilation with a diskkernel with a radius of 16 px to generate the final decisionmap. Note that ST algorithm is a single-scale detector withno image content involved in the final decision-makingprocess.

• Single-Orientation Multi-Scale (SO+MS) fusion basedalgorithm [28]: SO+MS formulates the forgery localiza-tion as an image labeling problem and solves it by theconditional random field (CRF) model. The data term ofthe CRF model is the average of the threshold-drifted [16],[28] forgery probabilitymaps obtainedwith single-orienteddetection windows at 7 different scales, and the neighbor-hood interaction of image content is encoded in the regu-larization term. The final binary decision map is obtainedby optimizing the CRF model.

• Single-Orientation Adaptive Window (SO+AW) fusionbased algorithm [28]: SO+AW aims to fuse the forgeryprobability maps obtained with single-oriented detectionwindows of 7 scales. Starting from the smallest scale,it looks for a sufficiently confident decision (i.e. forgeryprobability that is far from 0.5) for each location in theimage. If the decision at a smaller scale is not confidentenough, it proceeds to the next larger scale until a suffi-ciently confident decision and an agreement between twoconsecutive scales are reached. Finally, the final binarydecision map is obtained by applying the threshold driftstrategy and optimizing the CRF model.

• Segmentation-Guided (SG) based algorithm [28]: SG algo-rithm calculates the forgery probability by only consideringthe pixels with an intensity value close to the central pixel(the average L1 distance in RGB space less than 15) withina detection window of 128×128 px. It implements the ideaof image segmentation but only exploits the intensity differ-ence between individual pixels. Similarly to SO+AW andSO+MS, the threshold drift strategy and CRF are appliedto obtain the final binary decision map.Note that the notations ‘SO+MS’, ‘SO+AW’, ‘SG’ used

in this article correspond to the ‘MSF’, ‘AW+’ and ‘SG+’algorithms in [28]. We use the notations ‘AW’ and ‘MS’to denote the adaptive window and multi-scale strategiesproposed in [28] for multi-scale fusion. For our pro-posed segmentation-based schemes, we will use the nota-tion ‘SEG+FUSION’ to represent the FUSION∈{AW, MS}fusion of the probability maps calculated based on the super-pixels generated by SEG∈{SLIC, ERS, ETPS, EG-ETPS}algorithm. Similarly, for our proposed multi-orientationforgery localization scheme, we will use the notation‘SEG+MO+FUSION’ to represent the FUSION∈{AW,MS} fusion of the integrated probability maps obtained withmulti-oriented detection windows and the superpixel algo-rithm SEG∈{SLIC, ERS, ETPS, EG-ETPS}.

For SO+MS, SO+AW and SG, we used exactly the sameparameters as summarized in Table 2 of [28]. For the segmen-tation algorithms used in this work, their parameters are givenas follows:

VOLUME 8, 2020 222653


• SLIC [41]: We set both the compactness parameter and theiteration number of pixel assignment and centroid updatingto 10.

• ERS [42]: As suggested in [42], we set the weightingfactor of the balancing term λ′=0.5 and theGaussian kernelbandwidth σ=5.0 for calculating the pixel similarities.

• ETPS [43]: We set the weighting factors of both the shaperegularity term and the boundary length term to 0.2, i.e.λpos = 0.2 and λb = 0.2.

• EG-ETPS: We used exactly the same parameters as ETPSexcept for the weighting factor λetp of the PRNU homo-geneity term. As can be expected, the setting of λetp willdepend on the quality of PRNU. Thus we empirically setλetp = 2.5 · exp(−(R2 − 1)2/0.3). R2 is the adjustedR-squared coefficient for the correlation predictor trainedat the scale of 64× 64 px, which is a good indicator of thequality of PRNU. This results in a λetp ∈ [1.8, 2.2] for thefour cameras in the RTD dataset.

For our proposed multi-orientation scheme, we set ξ = 0.2in Eq. (11), which controls the average size of superpixelsrelative to the window size at each scale.

1) RESULTS FOR SEGMENTATION-BASEDLOCALIZATION SCHEMESIn this experiment, we applied the adaptive window andmulti-scale fusion strategies directly on the probability mapsobtained by calculating the forgery probability on eachgenerated superpixel. To make the average superpixel sizeconsistent with the detection window size at each scale,we specified the desired superpixel number K of each seg-mentation algorithm as b N

w2 e, where N is the number ofpixels in the image, w×w is the size of the detection win-dow at a specific scale and b·e is the rounding operator.We only used 5 scales, i.e. w ∈ {32, 48, 64, 96, 128}, forour segmentation-based schemes, while for the comparedmethods SO+AW and SO+MS proposed in [28], we stillused 7 scales.

we show the average score curves Fr and Fb in Fig. 8as well as the average peak scores Fr and Fb in Fig. 9,respectively. For easy comparison, we also summarized thehighest average Fr , Fb and the performance gain of the bestsegmentation-based scheme relative to the correspondingsingle-orientation scheme in Table 1. Note that ST and SGare not included in Table 1 because they have inferior perfor-mance than multi-scale methods as clearly shown in Fig. 8.We can see that even with only 5 scales, the segmentation-based forgery localization schemes are able to deliver bet-ter performance than SO+AW and SO+MS on the entireRTD dataset. The advantage of the segmentation-basedschemes is more evident if the performance is measuredby Fb (the 3rd and 4th columns of Fig. 8), which isincreased by 12.4% and 11.8% for AW and MS multi-scalefusions, respectively. Regarding the localization perfor-mance on two different boundary types, it is not surpris-ing to see that segmentation-based schemes outperform the

TABLE 1. Segmentation-based vs. single-orientation forgery localizationschemes in terms of the highest average region Fr scores and boundaryFb scores across all thresholds. ‘SO’ stands for the fusion of probabilitymaps obtained with single-oriented detection windows as proposedin [28]. The best performance on each dataset is highlighted in bold.

single-orientation schemes by large margins on HBF+MBF,with 9.6% increase in Fr and 21.8% increase in Fb whenMS is applied. However, due to the lack of distinct localinformation to guide the segmentation, segmentation-basedschemes perform slightly worse than those based on detectionwindows when localizing the forgeries in SBF+MBF.

As for the comparison between different segmentationalgorithms, we can observe that EG-ETPS benefits from theintegration of PRNU homogeneity information and slightlyoutperforms other segmentation algorithms. It is worth men-tioning that the benefits of EG-ETPS are mainly reflected indetecting the boundaries of the forgeries, so its overall per-formance gain over other segmentation algorithms dependson the amount of soft boundaries and is more evident whenthe performance is measured in terms of boundary measure-ment Fb. For the other three segmentation algorithms, ETPSdelivers generally better performance than SLIC and ERS.An interesting observation is that the superpixels generatedby ERS exhibit excellent boundary adherence and providethe best Fb performance among the four segmentation algo-rithms on HBF+MBF (see Fig. 9), but the trade-off is thatits capability in detecting forgeries in SBF+MBF is greatlycompromised.

2) RESULTS FOR MULTI-ORIENTATIONLOCALIZATION SCHEMESIn this experiment, we aim to compare the performance ofour proposed multi-orientation localization schemes and themethods based on single-oriented detection windows. Thecomparison results are shown in Fig. 10 and Fig. 11. Sim-ilarly, for easy comparison, we also summarized the high-est average measures Fr and Fb across all thresholds and

222654 VOLUME 8, 2020


FIGURE 8. Segmentation-based vs. single-orientation forgery localization schemes in terms of average score curves. The Fr and Fb on the RTDdataset (220 images), HBS+MBF (140 images) and SBF+MBF (120 images) subsets are shown in the 1st, 2nd and 3rd row, respectively.

FIGURE 9. Segmentation-based vs. single-orientation forgery localization schemes in terms of average peak scores. The Fr and Fb on the RTDdataset (220 images), HBS+MBF (140 images) and SBF+MBF (120 images) subsets are shown in the 1st, 2nd and 3rd column, respectively. Thevertical dash lines indicate the best performance of single-orientation schemes, i.e. ST, SG, SO+AW and SO+MS.

the performance gain of the best multi-orientation schemerelative to the corresponding single-orientation schemein Table 2.

We can see that the MS fusion strategy achieves consider-ably better performance than the AW fusion for our proposedmulti-orientation schemes. A closer inspection revealed thatAW is more likely to introduce false positives especially inthe regions where PRNU is substantially attenuated. In addi-tion, compared to the single-orientation schemes, signif-icant performance improvement can be observed for themulti-orientation localization schemes when the MS fusionstrategy is applied, with the highest Fr and Fb increased by

7.5% and 18.7% respectively on the entire RTD dataset. Themulti-orientation schemes are evenmore advantageous on theHBF+MBF subset as evidenced by the performance gainsof 10.9% in Fr and 28.6% in Fb. While on the SBF+MBFsubset, the multi-orientation schemes still deliver compara-ble performance as the best two single-orientation schemesSO+AW and SO+MS.

Another important observation for the proposed multi-orientation schemes is that the performance gap betweenthe four segmentation algorithms becomes very small. Forthe segmentation-based localization schemes, the differ-ence between the segmentation algorithms are much more

VOLUME 8, 2020 222655


FIGURE 10. Multi-orientation vs. single-orientation forgery localization schemes. The Fr and Fb on the RTD dataset (220 images), HBS+MBF(140 images) and SBF+MBF (120 images) subsets are shown in the 1st, 2nd and 3rd row, respectively.

FIGURE 11. Multi-orientation vs. single-orientation forgery localization schemes in terms of average peak scores. Fr and Fb on the RTDdataset (220 images), HBS+MBF (140 images) and SBF+MBF (120 images) subsets are shown in the 1st, 2nd and 3rd column, respectively.The vertical dash lines indicate the best performance of single-orientation schemes, i.e. ST, SG, SO+AW and SO+MS.

noticeable when localizing the forgeries in the SBF+MBFsubset (see the fourth row of Fig. 8). However, forthe multi-orientation localization schemes, the capabilityof localizing soft-boundary forgeries mainly stems fromthe detection based on multi-oriented detection windowsrather than the segmentation algorithms, which narrowsthe performance gaps between different segmentation algo-rithms. Some examples of forgery localization can be foundin Fig. 12, where we only show the results of EG-ETPS andMS for the proposed multi-orientation forgery localizationschemes. Note that for MS, we show the average of theprobability maps across different scales to approximate thefused probability map in Fig. 12.

D. ROBUSTNESS AGAINST JPEG COMPRESSIONOne of the main threats to PRNU-based forgery localiza-tion is JPEG compression. Thus, another experiment was

conducted to evaluate the robustness against JPEG com-pression. We generated 6 new versions of each image byre-saving the corresponding TIFF images with JPEG qual-ity factors of 100, 95, 90, 85, 80 and 75. We then ranall the localization algorithms on the image set of eachversion and calculated the performance statistics. To cal-culate the corresponding forgery probabilities, we usedthe reference PRNU and correlation predictors trainedwith TIFF images for JPEG images of different qualitylevels.

The results on the entire RTD dataset in terms of Frand Fb are illustrated in Fig. 13. For the readability ofthe figure, we did not show the adaptive-window fusionresults for the segmentation-based and multi-orientationlocalization schemes. As can be observed, when the imagequality is above 90, both the segmentation-based andthe multi-orientation schemes consistently outperform the

222656 VOLUME 8, 2020


FIGURE 12. Example forgery localization results. Color coding: green: detected forged regions (true positives); red : detected pristine regions (falsepositives); blue: mis-detected forged regions (false negatives). More forgery localization results can be found in supplementary materials.

TABLE 2. Multi-orientation vs. single-orientation forgery localizationschemes in terms of the highest average region Fr scores and boundaryFb scores across all thresholds.

single-orientation methods ST, SG, SO+AW and SO+MSon the entire RTD dataset. We can also observe that the Frof SG deteriorates more slowly than that of other methods asthe compression becomes more severe (see Fig. 13a and 13c),but this phenomenon was not observed if the performance ismeasured by Fb.

FIGURE 13. Impact of JPEG compression on localization performance. ‘∞’in the x-axis corresponds to uncompressed images. (a):segmentation-based vs. single-orientation w.r.t. Fr ; (b):segmentation-based vs. single-orientation w.r.t. Fb; (c): multi-orientationvs. single-orientation w.r.t. Fr ; (d): multi-orientation vs. single-orientationw.r.t. Fb.

V. CONCLUSIONIn this work, we investigated the potential of explicit imagesegmentation for content forgery localization.We have shownthat image segmentation by exploiting the visual content is

VOLUME 8, 2020 222657


beneficial for improving the performance of forgery local-ization based on imperceptible forensic clues, especially forhard-boundary forgeries. While the effectiveness of segmen-tation merely based on visual content can be compromisedfor soft-boundary forgeries, such limitation can be mitigatedby further integrating the local homogeneity of imperceptibleforensic clues to guide the segmentation. To better resolve theissue of detecting soft-boundary forgeries, we also proposeda localization scheme based on the multi-orientation fusionof the forgery probability maps obtained by multi-orientationdetection and image segmentation. With the aid of themulti-scale fusion, the multi-orientation detection is effec-tive in detecting soft-boundary forgeries and the segmen-tation is particularly good at identifying hard-boundaryforgeries. Integrating them in a complementary way leadsto the superior localization performance for our proposedmulti-orientation schemes at the expense of extra compu-tation complexity. Although we used PRNU-based forgerylocalization as an example in this article, we believe thatsimilar ideas can also apply to forgery detectors based onother forensic clues. Further investigations on the potentialof image segmentation in other forensics detectors as well asthe combination of themwill be conducted in our future work.

APPENDIX: INCREMENTAL UPDATE OF FORGERYPROBABILITYWe calculate the forgery probability Psi of superpixel si usingEq. (3):

Psi =

(1+exp

(ρ2si

2σ 20

−(ρsi − ρsi )

2

2σ 21

− lnσ1

σ0

))−1, (17)

where ρsi and ρsi are respectively the real and the expectedcorrelations between the reference PRNU and the noise resid-ual of superpixel si, and σ0 and σ1 are the standard deviationsof the correlation distributions under hypotheses h0 and h1.Like in [28], we use spline interpolation to dynamicallyobtain σ0 and σ1 based on the number of pixels in si and thestandard deviations of the correlation distributions for scales{32, 48, . . . , 256} estimated from training data. By assumingthat PRNU noise is locally zero-mean, we simplify Eq. (2) as

ρsi =||rsi · nsi ||1||rsi ||2 · ||nsi ||2

(18)

to allow for the efficient incremental update for each super-pixel si. For the expected correlation ρsi , we adopt the corre-lation predictor based on least-square estimator [26]:

ρsi = f si θ si , (19)

where θ si is a 15×1 least-square parameter vector corre-sponding to the scale closest to the number of pixels in si, andf si is a 1×15 vector composed of four image features (i.e.the intensity, texture, signal-flattening and texture-intensityfeatures) and their second-order terms for superpixel si. Sincethe calculation of the image feature is a local operation, e.g.the intensity feature is the mean of the attenuated intensity

of an image block, f si can be updated incrementally if amacropixel j is removed from or added to si:

f (t)si = f (t−1)si −NjNsi

(f (t−1)si ± f j

), (20)

where f (t−1)si and f (t)si are the image features of superpixel sibefore and after the update, f j and Nj are respectively theimage features and the number of pixels in macropixel j, andNsi is the number of pixels in si after the update.

REFERENCES[1] P. W. Wong and N. Memon, ‘‘Secret and public key image watermark-

ing schemes for image authentication and ownership verification,’’ IEEETrans. Image Process., vol. 10, no. 10, pp. 1593–1601, Oct. Oct. 2001.

[2] M. U. Celik, G. Sharma, and A. M. Tekalp, ‘‘Lossless watermarking forimage authentication: A new framework and an implementation,’’ IEEETrans. Image Process., vol. 15, no. 4, pp. 1042–1049, Apr. 2006.

[3] H. Farid and S. Lyu, ‘‘Higher-order wavelet statistics and their applicationto digital forensics,’’ in Proc. Conf. Comput. Vis. Pattern Recognit. Work-shop, vol. 8, Jun. 2003, p. 94.

[4] J. Wang, T. Li, Y.-Q. Shi, S. Lian, and J. Ye, ‘‘Forensics featureanalysis in quaternion wavelet domain for distinguishing photographicimages and computer graphics,’’ Multimedia Tools Appl., vol. 76, no. 22,pp. 23721–23737, Nov. 2017.

[5] X. Shen, Z. Shi, and H. Chen, ‘‘Splicing image forgery detection usingtextural features based on the grey level co-occurrence matrices,’’ IETImage Process., vol. 11, no. 1, pp. 44–53, Jan. 2017.

[6] L. Verdoliva, D. Cozzolino, and G. Poggi, ‘‘A feature-based approach forimage tampering detection and localization,’’ in Proc. IEEE Int. WorkshopInf. Forensics Security (WIFS), Dec. 2014, pp. 149–154.

[7] H. Li, W. Luo, X. Qiu, and J. Huang, ‘‘Identification of various imageoperations using residual-based features,’’ IEEE Trans. Circuits Syst. VideoTechnol., vol. 28, no. 1, pp. 31–45, Jan. 2018.

[8] M. Kirchner, ‘‘Fast and reliable resampling detection by spectral analysisof fixed linear predictor residue,’’ inProc. 10th ACMWorkshopMultimediaSecur., 2008, pp. 11–20.

[9] X. Liu, W. Lu, Q. Zhang, J. Huang, and Y.-Q. Shi, ‘‘Downscaling factorestimation on pre-JPEG compressed images,’’ IEEE Trans. Circuits Syst.Video Technol., vol. 30, no. 3, pp. 618–631, Mar. 2020.

[10] M. C. Stamm and K. J. R. Liu, ‘‘Forensic detection of image manipulationusing statistical intrinsic fingerprints,’’ IEEE Trans. Inf. Forensics Security,vol. 5, no. 3, pp. 492–506, Sep. 2010.

[11] X. Lin, C.-T. Li, and Y. Hu, ‘‘Exposing image forgery through the detec-tion of contrast enhancement,’’ in Proc. IEEE Int. Conf. Image Process.,Sep. 2013, pp. 4467–4471.

[12] X. Kang, M. C. Stamm, A. Peng, and K. J. R. Liu, ‘‘Robust median filter-ing forensics using an autoregressive model,’’ IEEE Trans. Inf. ForensicsSecurity, vol. 8, no. 9, pp. 1456–1468, Sep. 2013.

[13] I. Amerini, L. Ballan, R. Caldelli, A. Del Bimbo, and G. Serra, ‘‘Asift-based forensic method for copy–move attack detection and trans-formation recovery,’’ IEEE Trans. Inf. Forensics Security, vol. 6, no. 3,pp. 1099–1110, Sep. 2011.

[14] I. Amerini, R. Becarelli, R. Caldelli, andA.DelMastio, ‘‘Splicing forgerieslocalization through the use of first digit features,’’ in Proc. IEEE Int.Workshop Inf. Forensics Security (WIFS), Dec. 2014, pp. 143–148.

[15] W. Wang, J. Dong, and T. Tan, ‘‘Exploring DCT coefficient quantizationeffects for local tampering detection,’’ IEEE Trans. Inf. Forensics Security,vol. 9, no. 10, pp. 1653–1666, Oct. 2014.

[16] P. Korus and J. Huang, ‘‘Multi-scale fusion for improved localizationof malicious tampering in digital images,’’ IEEE Trans. Image Process.,vol. 25, no. 3, pp. 1312–1326, Mar. 2016.

[17] E. Kee and H. Farid, ‘‘Exposing digital forgeries from 3-D lighting envi-ronments,’’ inProc. IEEE Int. Workshop Inf. Forensics Security, Dec. 2010,pp. 1–6.

[18] W. Zhang, X. Cao, J. Zhang, J. Zhu, and P.Wang, ‘‘Detecting photographiccomposites using shadows,’’ in Proc. IEEE Int. Conf. Multimedia Expo,Jun. 2009, pp. 1042–1045.

[19] E. Kee, J. F. O’brien, and H. Farid, ‘‘Exposing photo manipulation fromshading and shadows,’’ ACM Trans. Graph., vol. 33, no. 5, pp. 1–165,2014.

222658 VOLUME 8, 2020


[20] J. F. O’Brien and H. Farid, ‘‘Exposing photo manipulation with inconsis-tent reflections,’’ ACM Trans. Graph., vol. 31, no. 1, pp. 1–4, 2012.

[21] H. Yao, S. Wang, Y. Zhao, and X. Zhang, ‘‘Detecting image forgeryusing perspective constraints,’’ IEEE Signal Process. Lett., vol. 19, no. 3,pp. 123–126, Mar. 2012.

[22] P. Ferrara, T. Bianchi, A. DeRosa, andA. Piva, ‘‘Image forgery localizationvia fine-grained analysis of CFA artifacts,’’ IEEE Trans. Inf. ForensicsSecurity, vol. 7, no. 5, pp. 1566–1577, Oct. 2012.

[23] T.-T. Ng, S.-F. Chang, and M.-P. Tsui, ‘‘Using geometry invariants forcamera response function estimation,’’ in Proc. IEEE Conf. Comput. Vis.Pattern Recognit., Jun. 2007, pp. 1–8.

[24] Y.-F. Hsu and S.-F. Chang, ‘‘Camera response functions for image foren-sics: An automatic algorithm for splicing detection,’’ IEEE Trans. Inf.Forensics Security, vol. 5, no. 4, pp. 816–825, Dec. 2010.

[25] I. Yerushalmy and H. Hel-Or, ‘‘Digital image forgery detection based onlens and sensor aberration,’’ Int. J. Comput. Vis., vol. 92, no. 1, pp. 71–91,Mar. 2011.

[26] M. Chen, J. Fridrich, M. Goljan, and J. Lukas, ‘‘Determining image originand integrity using sensor noise,’’ IEEE Trans. Inf. Forensics Security,vol. 3, no. 1, pp. 74–90, Mar. 2008.

[27] G. Chierchia, G. Poggi, C. Sansone, and L. Verdoliva, ‘‘A Bayesian-MRFapproach for PRNU-based image forgery detection,’’ IEEE Trans. Inf.Forensics Security, vol. 9, no. 4, pp. 554–567, Apr. 2014.

[28] P. Korus and J. Huang, ‘‘Multi-scale analysis strategies in PRNU-basedtampering localization,’’ IEEE Trans. Inf. Forensics Security, vol. 12, no. 4,pp. 809–824, Apr. 2017.

[29] X. Pan, X. Zhang, and S. Lyu, ‘‘Exposing image splicing with inconsistentlocal noise variances,’’ in Proc. IEEE Int. Conf. Comput. Photography(ICCP), Apr. 2012, pp. 1–10.

[30] L. Bondi, S. Lameri, D. Guera, P. Bestagini, E. J. Delp, and S. Tubaro,‘‘Tampering detection and localization through clustering of camera-basedCNN features,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Work-shops (CVPRW), Jul. 2017, pp. 1855–1864.

[31] I. Amerini, T. Uricchio, L. Ballan, and R. Caldelli, ‘‘Localization ofJPEG double compression through multi-domain convolutional neuralnetworks,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops(CVPRW), Jul. 2017, pp. 1865–1871.

[32] D. Cozzolino and L. Verdoliva, ‘‘Noiseprint: A CNN-based camera modelfingerprint,’’ IEEE Trans. Inf. Forensics Security, vol. 15, pp. 144–159,2020.

[33] M. Barni andA. Costanzo, ‘‘Dealingwith uncertainty in image forensics: Afuzzy approach,’’ in Proc. IEEE Int. Conf. Acoust., Speech Signal Process.(ICASSP), Mar. 2012, pp. 1753–1756.

[34] M. Fontani, T. Bianchi, A. De Rosa, A. Piva, and M. Barni, ‘‘A frameworkfor decision fusion in image forensics based on Dempster–Shafer theory ofevidence,’’ IEEE Trans. Inf. Forensics Security, vol. 8, no. 4, pp. 593–607,Apr. 2013.

[35] L. Gaborini, P. Bestagini, S. Milani, M. Tagliasacchi, and S. Tubaro,‘‘Multi-clue image tampering localization,’’ in Proc. IEEE Int. WorkshopInf. Forensics Security (WIFS), Dec. 2014, pp. 125–130.

[36] D. Cozzolino, D. Gragnaniello, and L. Verdoliva, ‘‘Image forgery detectionthrough residual-based local descriptors and block-matching,’’ in Proc.IEEE Int. Conf. Image Process. (ICIP), Oct. 2014, pp. 5297–5301.

[37] A. Ferreira, S. C. Felipussi, C. Alfaro, P. Fonseca, J. E. Vargas-Muñoz,J. A. dos Santos, and A. Rocha, ‘‘Behavior knowledge space-based fusionfor Copy–Move forgery detection,’’ IEEE Trans. Image Process., vol. 25,no. 10, pp. 4729–4742, Oct. 2016.

[38] G. Chierchia, S. Parrilli, G. Poggi, L. Verdoliva, and C. Sansone, ‘‘PRNU-based detection of small-size image forgeries,’’ in Proc. 17th Int. Conf.Digit. Signal Process. (DSP), Jul. 2011, pp. 1–6.

[39] G. Chierchia, D. Cozzolino, G. Poggi, C. Sansone, and L. Verdoliva,‘‘Guided filtering for PRNU-based localization of small-size image forg-eries,’’ in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP),May 2014, pp. 6231–6235.

[40] K. He, J. Sun, and X. Tang, ‘‘Guided image filtering,’’ IEEE Trans. PatternAnal. Mach. Intell., vol. 35, no. 6, pp. 1397–1409, Jun. 2013.

[41] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk,‘‘SLIC superpixels compared to State-of-the-Art superpixel methods,’’IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 11, pp. 2274–2282,Nov. 2012.

[42] M.-Y. Liu, O. Tuzel, S. Ramalingam, and R. Chellappa, ‘‘Entropy ratesuperpixel segmentation,’’ in Proc. CVPR, Jun. 2011, pp. 2097–2104.

[43] J. Yao, M. Boben, S. Fidler, and R. Urtasun, ‘‘Real-time coarse-to-finetopologically preserving segmentation,’’ in Proc. IEEE Conf. Comput. Vis.Pattern Recognit. (CVPR), Jun. 2015, pp. 2947–2955.

[44] K. Yamaguchi, D. McAllester, and R. Urtasun, ‘‘Efficient joint segmenta-tion, occlusion labeling, stereo and flow estimation,’’ in Proc. Eur. Conf.Comput. Vis., 2014, pp. 756–771.

[45] W. Zhang, X. Tang, Z. Yang, and S. Niu, ‘‘Multi-scale segmentation strate-gies in PRNU-based image tampering localization,’’ Multimedia ToolsAppl., vol. 78, no. 14, pp. 20113–20132, Jul. 2019.

[46] Y. Liu, Q. Guan, X. Zhao, andY. Cao, ‘‘Image forgery localization based onmulti-scale convolutional neural networks,’’ in Proc. 6th ACM WorkshopInf. Hiding Multimedia Secur., 2018, pp. 85–90.

[47] S. Ren, K. He, R. Girshick, and J. Sun, ‘‘Faster r-CNN: Towards real-time object detection with region proposal networks,’’ in Proc. Adv. NeuralInform. Process. Syst., 2015, pp. 91–99.

[48] P. Korus and J. Huang, ‘‘Evaluation of random field models in multi-modalunsupervised tampering localization,’’ in Proc. IEEE Int. Workshop Inf.Forensics Security (WIFS), Dec. 2016, pp. 1–6.

[49] G. Csurka, D. Larlus, and F. Perronnin, ‘‘What is a good evaluation mea-sure for semantic segmentation?’’ in Proc. Brit. Mach. Vis. Conf., vol. 27,2013, pp. 1–11.

XUFENG LIN received the B.E. degree in elec-tronic and information engineering from the HefeiUniversity of Technology, Hefei, China, in 2009,the M.E. degree in signal and information process-ing from the South China University of Technol-ogy, Guangzhou, China, in 2012, and the Ph.D.degree in computer science from The University ofWarwick, Coventry, U.K., in 2017. He is currentlya Research Fellow with the School of Informationand Technology, Deakin University, Australia. His

research interests include digital forensics, multimedia security, machinelearning, and computer vision.

CHANG-TSUN LI (Senior Member, IEEE)received the B.Sc. degree in electrical engineer-ing from National Defence University (NDU),Taiwan, in 1987, the M.Sc. degree in computerscience from the U.S. Naval Postgraduate School,USA, in 1992, and the Ph.D. degree in computerscience from The University of Warwick, U.K.,in 1998. From 1998 to 2002, he was an Asso-ciate Professor with the Department of ElectricalEngineering, NDU. He was a Visiting Professor

with the Department of Computer Science, U.S. Naval Postgraduate School,in 2001. He was a Professor with the Department of Computer Science,The University of Warwick, in January 2017. From January 2017 to Febru-ary 2019, he was a Professor with Charles Sturt University, Australia. He iscurrently a Professor with the School of Information Technology, DeakinUniversity, Australia. His research interests include multimedia forensicsand security, biometrics, data mining, machine learning, data analytics,computer vision, image processing, pattern recognition, bioinformatics, andcontent-based image retrieval. The outcomes of his multimedia forensicsresearch have been translated into award-winning commercial productsprotected by a series of international patents and have been used by a numberof police forces and courts of law around the world. He also served as amember of the international program committees for several internationalconferences. He involved in the organization of many international confer-ences and workshops. He is also actively contributing keynote speechesand talks at various international events. He is also an Associate Editorof the EURASIP Journal on Image and Video Processing (JIVP) and IETBiometrics.

VOLUME 8, 2020 222659

prnu-based content forgery localization augmented with

Documents