arxiv:1807.03376v2 [cs.cv] 15 jul 2018tion [19 ,41], image splicing detection [21 6 38 17 34 15] and...

Beyond Pixels: Image Provenance Analysis Leveraging Metadata

Aparna Bharati1, Daniel Moreira1, Joel Brogan1, Patricia Hale1,Kevin W. Bowyer1, Patrick J. Flynn1, Anderson Rocha2, Walter J. Scheirer1

1 University of Notre Dame, IN, USA 2University of Campinas, SP, Brazil

Abstract

Creative works, whether paintings or memes, followunique journeys that result in their final form. Understand-ing these journeys, a process known as “provenance anal-ysis,” provides rich insights into the use, motivation, andauthenticity underlying any given work. The application ofthis type of study to the expanse of unregulated content onthe Internet is what we consider in this paper. Provenanceanalysis provides a snapshot of the chronology and validityof content as it is uploaded, re-uploaded, and modified overtime. Although still in its infancy, automated provenanceanalysis for online multimedia is already being applied todifferent types of content. Most current works seek to buildprovenance graphs based on the shared content between im-ages or videos. This can be a computationally expensivetask, especially when considering the vast influx of con-tent that the Internet sees every day. Utilizing non-content-based information, such as timestamps, geotags, and cam-era IDs can help provide important insights into the patha particular image or video has traveled during its time onthe Internet without large computational overhead. This pa-per1 tests the scope and applicability of metadata-based in-ferences for provenance graph construction in two differentscenarios: digital image forensics and cultural analytics.

1. Introduction

Understanding the story behind a visual object is an ac-tivity of broad interest. Whether it is determining the paletteused to make a painting, the style of a sculptor, or the au-thenticity of an artwork, deriving the origin and composi-tion of the object at hand has been a difficult but impor-tant task for many examiners. Subtle clues derived from thenature of works of art have long been used to provide an-swers to provenance related questions [8]. Off-white colors

1This material is based on research sponsored by DARPA and AirForce Research Laboratory (AFRL) under agreement number FA8750-16-2-0173. Hardware support was generously provided by the NVIDIACorporation. We also thank the financial support of FAPESP (Grant2017/12646-3, DejaVu Project), CAPES (DeepEyes Grant) and CNPq(Grant 304472/2015-8).

brighteningcropping

splicing

removal splicingsplicingsplicing splicing removal

Figure 1. Example of an Image Provenance Graph (IPG) showingsome common operations performed on images and how they aremanifested in the case of provenance. The examples in this caseare meme-style images similar to the ones from the photoshopbat-tles community on the Reddit social media site [60]. The transfor-mations can be as simple as increasing the brightness or as com-plex as multi-composition. In this paper, we consider the incorpo-ration of meta-data to improve the construction of such graphs.

found in the painting Darby and Joan by Laurence StephenLowry brought into question its authenticity [10]. Lead con-tent in the paint of Danseuse Bleue et Contrebasses andcareful scrutiny of the painter’s signature allowed expertsto rightly restore the validity of Edgar Degas’s most famouswork [9]. Provenance analysis of this sort has helped histo-rians, cultural analysts and art enthusiasts to analyze the ori-gin, content and growth of works such as these. Althoughthe techniques used to perform provenance analysis haveevolved over time [25], it is, in general, still an unsolvedproblem [63]. In the domain of art history, it is one of themost active and important areas of research [64] as thereare still complicated cases where provenance has yet to beestablished (e.g., the painting Bords de la Seine a Argen-teuil [51]) and new avenues for the interpretation of rela-tionships between artworks.

The above case studies might lead one to believe thatprovenance analysis is a tool to decipher events far in the

arX

iv:1

807.

0337

6v3

[cs

.CV

] 6

Mar

201

9

past. On the contrary, with the growth in popularity ofonline digital media, the need for provenance analysis hasnever been more timely. Current social sentiment can of-ten only be fully understood within the context of onlinememes and other viral movements [59]. Further, as thelines between real and fake images blur, the extent to whichthese types of online phenomena can be deployed towardsthe deception of the public has become deeply concern-ing [32]. With high quality cameras and image editing soft-ware at anybody’s disposal, photographs have become eas-ier to forge than paintings or sculptures. We have reacheda point where digital forgeries can be produced with fine-grained detail, down to photographic style and sensor noise[49, 44]. These advancements in anti-forensics underminethe content’s credibility, ownership, and authenticity. Thecurrent scale at which images and videos are shared requiresan automated way of answering such questions.

Image processing and computer vision techniques canbe employed to detect correspondences between images orother digital art forms [47, 7, 68]. This kind of correspon-dence can range from object matching in images [46] tocomparing the style [29] and semantics [57] of the two.Provenance analysis can be thought of as ordering pair sim-ilarities between multiple image pair sets, and is therefore anatural extension to pairwise image comparison. These sub-sequent ordered parings can be modeled as a graph, whereeach edge denotes a correspondence between a pair, and theend vertices of the edge signify the two respective images.An example of such a graph can be seen in Figure 1. Thisexample shows that a provenance analysis algorithm couldbe analyzing multiple very close-looking realistic versionsof the same visual object. Complex scenarios like this canmake content-based similarity metrics unreliable.

Due to the vast range of possible versions of a singleoriginal image, the metrics for quantifying the similaritybetween pairs of images can be noisy. Relying solely uponvisual cues to order the different versions into a graph canresult in poor provenance reconstructions [11, 52]. There-fore, it becomes pertinent to utilize other sources of datato determine connections. For example, it is difficult topoint out a semantic difference between the two images inFigure 2, but the images can be differentiated by inspect-ing the metadata of the image files. Such a pair of imagescan be termed semantically similar, as they are related toeach other in a semantic way but do not originate from thesame source [56, 11]. Matching difficulty can also arisewithin sets of near-duplicate images, which are generatedfrom a single origin having undergone a series of transfor-mations (e.g., crop→saturate→desaturate). The pixel-leveldata within these image sets can exhibit ambiguous prove-nance directionality. Information beyond pixel-level datamay be required to detect differences between such images.

To handle scenarios where image content fails to explain

GeoLocation: 48 deg 51' 28.75" N2 deg 17' 33.51" E

GeoLocation: 36 deg 6′ 44.22″ N115 deg 10′ 23.7″ W

Figure 2. Left: Photo of the Eiffel Tower taken at night in Paris.Right: Photo of the replica of the monument in Las Vegas taken atnight. Note that both photos depict the same visual object — onlythe image file metadata in this case can help us understand thatthey are completely different scenes. Photos and their metadatawere obtained from Flickr [13] and Wikimedia Commons [66].

image evolution, file metadata can be used to help fill inthe gaps. In this work, we explore the use of commonlypresent file metadata tags to improve image provenanceanalysis. We compare these results against image content-based methods and highlight the advantages and disadvan-tages of both.

2. Related Work

Provenance analysis is a widely known and studied phe-nomenon in various data-based domains such as the seman-tic web and data warehousing [30, 16, 4, 62]. However,provenance analysis for online multimedia has not beenas extensively studied in the existing literature. The typesof work most relevant and related to the problem of im-age provenance analysis come from three established con-cepts in the digital forensics literature: near-duplicate detec-tion [19, 41], image splicing detection [21, 6, 38, 17, 34, 15]and image phylogeny [24, 23, 22]. Most of the proposedmethods work towards classifying whether an image is anear-duplicate of the query image in a retrieval context anddo not determine the original image among the set of thenear-duplicates. However, that particular problem has beenstudied by the image phylogeny community.

Image phylogeny solutions aim at finding kinship rela-tions between different versions of an image [24]. Similarto provenance analysis, image phylogeny limits its repre-sentation to a single-root tree with the original image as theroot, even though there can be multiple original images con-tributing towards the creation of an image. The algorithm

receives a query image and outputs the Image PhylogenyTree (IPT). That method has also been extended to handlemultiple (two) roots by taking spliced images into consid-eration [56]. An example of this multiple parent scenariocan be observed in Figure 1 where four images (donors)contribute to the content of the central composite image. Aconstraint of these image phylogeny approaches that solvevery specific cases of image provenance analysis is that theyhave dealt with constrained datasets using a limited set oftransformations and image formats [39, 22]. In addition tothat, most of them only consider two images to form a com-posite, thereby limiting the solutions for large-scale gen-eral applicability. Thus new image provenance algorithmsmust generalize and be evaluated across different forgerydatasets, image transformations, file formats and image res-olutions to be applicable in real-world situations.

As a step towards a more general framework for imageprovenance analysis inspired by image phylogeny works,recent work on undirected provenance graph construc-tion [11] adopted a more general taxonomy and datasetproposed by the American National Institute of Standardsand Technology (NIST) [55]. It offered the U-phylogenypipeline as a preliminary approach towards solving prove-nance analysis, which is not restricted to either a closed setof image transformations, or the number of donor imagesto form multi-parent composites. Results are presented forscenarios with and without the presence of distractors (im-ages that are not related to the provenance history of thequery image) showing the approach to be tolerant to irrele-vant images. A limitation of the U-phylogeny approach isthat it does not provide a directed provenance graph, whichis required to understand the evolution of the media object.

In order to overcome the direction limitation and proposea scalable approach, a more complete end-to-end pipelinefor image provenance analysis was described in [52]. Thatmethod for graph construction first builds dissimilarity ma-trices based on local image features, and then employs hi-erarchical clustering to group nodes and draw edges withinthe final provenance graph. As stated in Section 1, rely-ing solely on image content can lead to noisy edge infer-ence. This is especially true for directed edges, which havebeen shown to be more difficult to derive than undirectededges [11, 52]. An option for addressing this is the use ofmetadata related to the images. File metadata has been pre-dominantly used for data and software provenance analy-sis [1, 4, 30], as such information reveals important cluesabout a file that cannot be directly derived from the data.Secondly, metadata related to online posts which includeimages can also be utilized for this purpose.

In the image domain, metadata often stores informationregarding the device used to capture the image and the soft-ware used to process the image. Information provided bythese types of tags has been utilized to improve the effec-

tiveness of tasks such as image grouping [37, 45], content-based image retrieval [2, 67], photo classification [14], im-age annotation [40] and copyright protection [33]. Amongthese, algorithms establishing semantic correspondencesbetween images, such as automatic grouping or classifica-tion, may utilize tags such as date, location, content orig-inator, camera type and scene type [35] whereas thosethat detect tampering may rely on detecting inconsisten-cies within the values of these and other tags containingsource and copyright information [18, 33]. While meta-data has been successfully used for forensics tasks in thepast [12, 28, 34, 48], it has not been used for provenanceanalysis before.

3. Proposed Approach

Image provenance analysis algorithms aim at construct-ing a provenance graph with related images, given a queryimage. The provenance graph [52] is a Directed AcyclicGraph (DAG) where each node corresponds to an image inthe set of related images and the edges stand for the rela-tionship of sharing duplicate content. The direction of anedge denotes the direction of the flow of content betweeneach pair of images and the overall provenance case. Inthis section, we explain in detail each of the three stages (asseen in Figure. 3) of image provenance analysis used in theproposed approach.

3.1. Filtering

The first step required to perform provenance analy-sis for a given query image involves collecting the set oftop-k relevant images. In this work, we follow the so-lution proposed in [52], which selects a subset of a largesource of images (such as millions of images from the In-ternet), whose elements include samples sharing full con-tent with the query with slight modifications (i.e., near-duplicate images), samples sharing partial content with thequery in any form (single or multiple foreground objects, orbackground), and samples transitively related to the query(e.g., near duplicates of the images sharing content with thequery). In summary, this solution utilizes Optimized Prod-uct Quantization (OPQ) to store local Speeded-Up RobustFeatures (SURF) [7] in an Inverted File index (IVF), witha large number (e.g., ∼400k) of representative centroids.Each image is described through at most 5k SURF features,which are fed to constitute the IVF index. To search the in-dex, multi-stage query expansion is utilized. The first stagemainly retrieves the hosts, while the second stage retrievesdonors; further stages retrieve the images transitively re-lated to the query, by replacing the original query with sam-ples retrieved in the previous stages.

Figure 3. Stages of image provenance analysis. The proposed method starts with filtering images related to the provided query image Q.The ‘k’ most relevant images are selected for pairwise image comparison. This step is not present in an oracle scenario where we assume tohave been provided with the perfectly correct set of ‘k’ related images. The images are compared in terms of visual content and metadata,yielding two types of adjacency matrices. The obtained matrices are then combined in the graph construction step to form an IPG.

3.2. Adjacency Matrix Computation

Upon receiving the set of top-k related images, denotedby R, to the query image Q, we build N × N (here, N =|R|+1) adjacency matricesD, in which each indexed valueD[i, j] is the similarity (or dissimilarity) quotient betweenimages i and j. The full matrices are obtained by comparing(n2 − n)/2 pairs.

Different from previous work, though, besides using amatrix that relies solely on visual content, we propose theemployment of an additional metadata-based asymmetricadjacency matrix that is used to determine the orientation ofthe pairwise image relations. To the best of our knowledge,this is the first work proposing a way to leverage metadatato complement visual information for the problem of prove-nance analysis.

For visual comparison, the images can be describedusing interest point detectors and descriptors (such asSURF [7]) or learned from data using a Convolutional Neu-ral Network. Image description for provenance analysistypically avoids using computationally expensive methodssuch as deep learning because of scalability concerns [68].An empirical evaluation we conducted comparing SURF [7]and ShuffleNet [69], one of the most efficient deep learningframeworks, highlights this. Ignoring training time, Shuf-fleNet took 3.5 minutes to describe 10k images using twoNvidia Quadro GPUs, while SURF (a C++ implementa-tion) took 39 seconds for the same images using one GPU.This motivates the usage of SURF-based detection and de-scription of keypoints for the visual comparison betweenimages. Once the images are described, for each imagepair, the p most relevant interest points of each image arematched using brute-force pairwise comparison based on

the L2 distance between the descriptors. The best matchedcorrespondences are filtered to retain the geometrically con-sistent ones, as described in [11]. As a consequence, asymmetric adjacency matrix is obtained with the quantityof matched interest points between each pair of images.

Commonly, the value of mutual information is used asthe degree of pairwise association between images, or asasymmetric weights of the edges in a complete graph amongthe N images, with no self loops [52]. In this work, in or-der to incorporate metadata information at this stage, weintroduce a heuristic-based normalized voting to attributeweights to each pairwise image relationship. The votingmethod is chosen as a complement to the similarity compar-ison in the visual domain. The heuristics used to obtain thescores for each pair are straightforward metadata-related as-sumptions in the context of image provenance and rely uponthe content of the tags. They include:

Date. To check for the temporal order of contentcreation, we individually compare the date-relatedtags – DateTimeOriginal, ModifyDate andCreateDate. Considering two images i and j, for eachone of the three dates, whenever available, the provenancerelation (i, j) gets one vote if the date of image i is earlieror equal than the respective date of image j. The rela-tionship in the opposite direction (j, i) is also analogouslyevaluated.

Location. Near-duplicates of an image (e.g., cropped ver-sions) should have the same geographic location as theoriginal one. Hence, we cast one vote for the pairwiseimage relationship (i, j), and one vote for the relation-ship (j, i), if image i shares with image j exactly thesame non-null values for the four location-related tags –

GPSLatitude, GPSLatitudeRef, GPSLongitude,GPSLongitudeRef. Although this does not help to de-fine the direction of the provenance between images i and j,since both (i, j) and (j, i) relationships get one vote, it doeshelp to give them more weight than the other image pairsthat do not share location content. In addition, in very com-plex image compositions where there is not a clear presenceof a foreground donor, the location-related metadata tagsmight be null or missing, contrary to the donors of the com-position. Thus, we alternatively cast one vote to the rela-tionship (i, j), if image i has non-null location informationand image j is missing it.

Camera. We propose to use camera-based metadata infor-mation in a way that is analogous to the location case. Ifimage i and image j share the same non-null content forthe camera’s Make, Model and Software tags simulta-neously, we cast one vote for both the (i, j) and (j, i) rela-tionships, suggesting near-duplication that maintained im-age metadata. Similarly, we cast one vote to (i, j) if imagei has camera information and image j does not.

Editing. We use the editing-related metadata tags to fig-ure out if either image i or image j were ever manipu-lated. Given that the provenance direction might occurfrom a non-manipulated to a manipulated image, we giveone vote to the relationship (i, j) if image j has informa-tion for any of the ProcessingSoftware, Artist,HostComputer, ImageResources. The relationshipin the opposite direction (j, i) is also evaluated in the samemanner.

Thumbnail. We extract the respective thumbnails of im-ages i and j. If the thumbnails are exactly the same, bothrelationships (i, j) and (j, i) get one vote, since it means oneimage might be generated from the other. Alternatively, ifimage i has a thumbnail and image j does not have one, thenthe relationship (i, j) gets one vote, indicating that image iis probably the original one.

These heuristics are used to generate a metadata-basedimage pairwise adjacency matrix M . For instance, takingimages i and j and the possible provenance relationshipfrom i to j, whenever a heuristic is satisfied, the respectivevalue M [i, j] is increased in one unity, meaning the cast ofone vote to the (i, j) relationship.

Aiming to keep the solution as widely applicable as pos-sible, the tags are selected based on their availability andrelevance to the provenance problem. An example of suchrelevance has been shown in Figure 4. It depicts an im-age pair example that is directionally ambiguous. Afterperforming interest-point-based pairwise analysis betweenthe two images in Figure 4(a), a valid argument for eithera splicing (left-to-right edge) or removal (right-to-left edge)operation between the two could be made. Utilizing the“DateTimeOriginal” tag from both images disambiguates

Figure 4. Usage of metadata information for determining directionin image pairwise provenance relationships. In (a), the output ofinterest-point-based analysis between two images is shown. Theoperation can be either a splicing or removal of the male lion. In(b), according to the date-based metadata, the operation is revealedto be a splice, since the image on the left is older.

the relationship, revealing that the lion was indeed splicedinto the image at a later time. While a large array of meta-data tags are often present in many images, only a smallsubset of these tags provide pertinent information usefulfor discerning inter-image relationships. Furthermore, us-ing tags provided by only specific camera firmwares or onlyapplicable for certain formats (e.g., JPEG) reduces the gen-eralizability of the proposed approach. The tags mentionedhere are EXIF tags (details provided in supplemental ma-terial) but the information provided by their values is whatholds relevance to provenance. In case EXIF metadata ismissing or tampered, information provided by online imageposts, such as date of submission, can also be utilized in asimilar way.

3.3. Graph Construction

Based on the values of the adjacency matrix, the finalgraph construction step chooses the most feasible set of di-rected edges (i.e., the set of edges that best represents thesequence of image operations). Each chosen directed edgedenotes a parent-child relationship in the graph.

Once the vision-based and metadata-based adjacencymatrices are available, one can either individually use themto directly generate a provenance graph, through, for exam-ple, the application of Kruskal’s Maximum Spanning Tree(MST) algorithm [43], or, as we are proposing, use a spe-cialized algorithm for constructing a directed provenancegraph, such as clustered provenance graph expansion, pro-posed in [52]. In the latter case, we suggest using themetadata-based asymmetric adjacency matrix to determinethe directions of the edges. In the experiments herein re-

ported, we investigate both strategies. In the end, the out-put graph can be represented as a binary adjacency matrix(BAM). BAM[i, j] is set to 1 whenever there is an edge be-tween images i and j, indicating i→ j flow of content.

Understandably, none of the proposed rules guaranteecorrect inference as metadata can be manipulated, wrong ormissing. Using multiple tags reduces the impact of an incor-rect inference and makes the process more robust. To mit-igate circumstances where file metadata is unavailable, wedemonstrate provenance in an online setting in our experi-ments using an alternative approach that can harvest meta-data from website users’ comments as opposed to the fileitself. In both scenarios, the proposed approach is designedto tolerate events such as data tampering. As the metadata-compliance score is a cumulative score metric, each ruleand the corresponding tags contribute to the value used tomake the edge decision.

4. Experimental SetupHere we detail the two evaluation scenarios and describe

the characteristics of the corresponding datasets.

4.1. Provenance Analysis for Digital Forensics

NIST has recently released a dataset curated for the tasksof provenance image filtering and graph construction in aforensics context, which is devoid of most of the limita-tions of the existing datasets. Similar to the experimentalsetup described in [52], we rely on the development parti-tion of this dataset since it provides a full set of ground-truth graphs. Named NC2017-Dev1-Beta4, the dataset con-tains 65 queries, and the ground-truth is released in the formof journals depicting provenance graphs. The provenancegraph journals were created manually with the help of a pro-prietary image-editing journaling tool. The graphs includelinks corresponding to simple image transformations suchas cropping, scaling, sharpening, blurring, and rotation, tocomplex ones such as splicing from multiple sources andobject removal. The total number of related images per caseranges from [2, 81]. In addition to the images relevant to theprovenance of each of the query images, the dataset alsocontains distractors (i.e., images not related to any query).

Following the protocol proposed by NIST [54], we per-form both end-to-end and oracle-filter provenance analysisover this dataset. End-to-end analysis requires performingprovenance filtering prior to graph construction [58]. In thiscase, for each query image, graphs are built upon a list ofranked images that might include distractors and miss gen-uinely related images due to imperfect image filtering. Toobtain these filtered image rank lists, we employ the bestsolution proposed in [52] and retrieve the top-100 rankedimages to the query, which may contain unrelated distrac-tors. Conversely, the oracle analysis does not require a fil-tering step, but instead starts with perfect ranks, i.e., rankscontaining all the relevant images and no distractors.

Orthogonal to the end-to-end versus oracle comparison,we also compare results for both metadata only and vi-sual + metadata solutions. When using only metadata, wecompute the vote-based metadata adjacency matrix, as ex-plained in Section 3.2. We use ExifTool [31] to performfile metadata extraction. A table listing the tags used andtheir details has been provided in the supplemental material.Once the adjacency matrix is computed, we apply Kruskal’smaximum spanning tree algorithm [43] to obtain the finalprovenance graph.

For fused metadata and visual solutions, we start with vi-sual content-based adjacency matrices, which are generatedaccording to the method explained in Section 3. We per-form two different computations, one based on SURF [7]and the other based on Maximally Stable Extremal Regions(MSER) [50]. Both solutions were proposed and evaluatedin [52], hence we follow their pipeline: (1) extraction of atmost 5k interest points (either with SURF or MSER), (2)computation of adjacency matrices based on the number ofgeometrically consistent interest-point matches, (3) compu-tation of adjacency matrices based on mutual information,and (4) application of the cluster-based method for generat-ing provenance graphs. For combining visual content andmetadata, we proceed as suggested in Section 3: within thecluster-based algorithm, in the step of establishing the di-rections of edges, instead of using the mutual-information-based adjacency matrix [52], we consider the metadata-based one and keep the directions with more votes.

The provenance graphs generated using the proposed ap-proach for both oracle and end-to-end scenarios are evalu-ated using the metrics proposed by NIST for the provenancetask [55]. The metrics focus on comparing the nodes andedges from both ground-truth and candidate graphs. Thecorresponding measures of Vertex Overlap (VO) and EdgeOverlap (EO) are the harmonic mean of precision and re-call (F1 score) for the nodes and edges retrieved by ourmethod. In addition to these, a unified metric representingone score for the graph overlap namely the Vertex EdgeOverlap (VEO) is also reported. The VEO is the combinedF1 score for nodes and edges. All the metrics are computedthrough the NIST MediScore tool [53]. The values of thesemetrics lie in the range [0, 1] where higher values are better.

4.2. Provenance Analysis for Cultural Analytics

To include experiments with more realistic examples, wealso evaluate the approaches from Section 3 on the Red-dit dataset introduced in [52] and maintained at [20]. Thisdataset contains provenance cases created from images ex-tracted from the photoshopbattles community on the Redditwebsite [60]. This community provides a platform for usersto experiment with image manipulation in a friendly con-text. Each thread begins with a single image submitted byone user, which serves as the base image for the manipu-

Table 1. Results of provenance graph construction over the NIST NC2017-Dev1-Beta4 dataset. We report the mean and the standarddeviation for the metrics on the provided 65 queries. Visual results are from Moreira et al. [52]. Best results are in bold.

Data Modality SolutionOracle Filtering End-to-End Analysis

VO EO VEO VO EO VEO

Visual [52]Cluster-SURF 0.931±0.075 0.124±0.166 0.546±0.096 0.853±0.157 0.353±0.236 0.613±0.163Cluster-MSER 0.892±0.154 0.123±0.161 0.525±0.129 0.835±0.180 0.312±0.252 0.585±0.177

Metadata Kruskal 0.999±0.003 0.117±0.099 0.577±0.053 0.249±0.115 0.009±0.016 0.130±0.057

Visual + MetadataCluster-SURF 0.931±0.075 0.445±0.266 0.699±0.148 0.853±0.157 0.384±0.248 0.628±0.169Cluster-MSER 0.891±0.154 0.389±0.254 0.651±0.176 0.838±0.182 0.345±0.232 0.603±0.174

lations of others, whose contributions appear as commentson the original post. For the purpose of provenance, Mor-eira et al. [52] utilize this comment structure to obtain 184provenance graphs with an average graph order of 56. Forthe sake of fair comparison, we evaluate the variants of theproposed approach on the exact same set. The full set of im-ages from Reddit do not contain distractors. This restrictsour experiments for provenance analysis in this setting tooracle-filter analysis only, in contrast to the NC2017-Dev1-Beta4 dataset.

Since the images in the Reddit dataset are collected fromthe web, the availability of metadata is restricted by the poli-cies of the Reddit website and image hosting services, suchas imgur.com [36]. For that reason, the metadata extrac-tion through ExifTool [31] does not deliver useful tags forprovenance analysis. As an alternative, we use the Redditusers’ comments and posts to estimate the date and time ofimage uploads, thus treating them as DateTimeOriginal val-ues, making it possible to invoke the date-based heuristics.

Here, one important comment can be made about therestricted availability of metadata and the apparent limitedpossibility of application of the present solution. Althoughmetadata might not be available to the general public, im-age hosting websites might still be storing them, hence be-ing able to apply the method in their headquarters or underlegal demand. Other image hosting websites such as Flickrand Picasa can be used as image sources that preserve meta-data tags, but they do not provide structured information forprovenance ground-truth extraction. This promotes Redditas a choice for obtaining graphs and evaluating provenancein a cultural setting. To evaluate our experimental results onthe Reddit dataset, we employ the same metrics and scorerused in the case of the NC2017-Dev1-Beta4 dataset.

5. Experimental Results

The experiments performed on both datasets show thatutilizing knowledge from metadata helps in the process ofedge inference for provenance. As it can be observed fromthe values reported in Table 1, the proposed method signifi-cantly improves total edge overlap, and thereby total graphoverlap, since it uses image-content-based information to

initially establish connections between images, then relieson metadata to refine edge direction. The tags and checksused in this work yield an edge overlap of 44.5% and graphoverlap (VEO) of ∼70% for provenance in the oracle sce-nario, improving notably over current state-of-the-art [52]by ∼15 percentage points (pp). More notably, metadata fu-sion provides a ∼30pp increase in EO in the oracle cases,when compared to [52].

In the end-to-end scenario, metadata usage also showsimprovements in edge overlap by ∼3pp, aiding the over-all graph overlap to reach >60%. Provenance analysis so-lutions thus far have struggled at obtaining good edge re-construction, as can be seen from the disparity between thevertex and edge overlap. Furthermore, the addition of dis-tractors reduces performance by ∼5pp, implying that se-mantically similar images within the distractor sets can leadto high inter-image similarity between pairs that should notbe related. This can negatively impact greedy graph con-struction approaches. Some success and failure provenancecases are presented in the supplemental material, includingthe graph visualizations.

To understand the contribution of each type of metadatainformation, we conduct an ablation study on the oracle andend-to-end scenarios using the Visual+Metadata, Cluster-SURF method from Section 4.1. We perform the experi-ment seven times, for each scenario, using only a subset ofheuristics for each run. Results are presented in Table 2. Inthe oracle scenario, while all five tags individually benefitgraph EO, the date-based one performs best, followed bythumbnail usage. For that reason, we also present, in thelast two rows of Table 2, the results of having all heuristicscombined except for date (to assess the impact of avoidingthe best one), as well as combination of date and thumbnail(the two best ones). Indeed, the date-based heuristic aloneslightly surpasses the combination of heuristics, in this par-ticular dataset and scenario. In the end-to-end scenario, inturn, observations are somewhat different. Metadata tagsalone do not improve the results of the visual solution, ex-cept for date and the date-thumbnail fusion, with the lattershowing the best results. Again, this might be particularto the dataset, where the added distractors probably presentmore unreliable metadata (due to tampering or removal).

imgur.com

Table 2. Ablation results for oracle and end-to-end provenance. We repeat the experiments seven times for the best solution presented inTable 1 (Visual + metadata, Cluster-SURF) in both scenarios, keeping only a subset of heuristics activated at a time. Best results in bold.

Heuristic Oracle Filtering End-to-End Analysis

VO EO VEO VO EO VEODate only 0.931±0.075 0.446±0.265 0.700±0.147 0.853±0.157 0.389±0.244 0.630±0.169

Location only 0.931±0.075 0.394±0.282 0.674±0.154 0.853±0.157 0.348±0.241 0.611±0.164Camera only 0.931±0.075 0.388±0.269 0.672±0.147 0.853±0.157 0.350±0.234 0.612±0.164Editing only 0.931±0.075 0.396±0.281 0.675±0.153 0.853±0.157 0.353±0.237 0.613±0.163

Thumbnail only 0.931±0.075 0.411±0.285 0.683±0.155 0.853±0.157 0.363±0.238 0.618±0.167All but Date 0.931±0.075 0.394±0.280 0.675±0.152 0.853±0.157 0.345±0.247 0.610±0.168

Date + Thumbnail 0.931±0.075 0.444±0.268 0.699±0.148 0.853±0.157 0.391±0.245 0.632±0.169

Table 3. Results of provenance graph construction over the Redditdataset. We report the average values of the metrics over the 184cases, as well as the standard deviations. This dataset only allowsus to report oracle-filtering results. Visual results are from Moreiraet al. [52]. Best results are in bold.

Solution VO EO VEOVisual [52]:

Cluster-SURF 0.757±0.341 0.037±0.034 0.401±0.181Cluster-MSER 0.509±0.388 0.027±0.034 0.271±0.207

Metadata:Kruskal 0.969±0.073 0.034±0.086 0.506±0.056

Visual + Metadata:Cluster-SURF 0.757±0.341 0.085±0.065 0.424±0.193Cluster-MSER 0.509±0.388 0.061±0.063 0.288±0.220

That reveals the importance of combining tags, since it leadsto a more robust solution to metadata tampering.

Provenance analysis becomes significantly more difficultwhen dealing with real-world scenarios, such as those pre-sented in the Reddit dataset. Although metadata doublesthe number of correctly retrieved edges, as seen in Table 3,the edge overlap is still much lower than for the NC2017-Dev1-Beta4 dataset. In the Reddit cases, images can be con-nected by visual puns, inside jokes, and purely associativecontent without any direct visual correspondence betweenthem. This is very common in meme-style imagery. Under-standing the quirks and sentiments of human language canfurther help provenance analysis in these contexts, but it hasnot yet been explored. To perform complex relationship re-trieval using image provenance analysis, input from othermodalities, such as text comments, may be required.

Since all experiments calculate initial correspondencesusing only visual image content, the purely visual methodand visual + metadata based method perform identicallywith respect to VO. This metric is generally high with alow standard deviation whereas the EO has very high stan-dard deviation. Due to the vast range of possible transfor-mations, the provenance analysis approaches are not able todetect and map certain image relationships as well as others.The results of the experiments for both scenarios show thatSURF detections for image matching are better than MSER

detections, which is consistent with the results in [52].

6. Discussion

Image metadata is a valuable asset for improving resultsfor vision-based problems such as image retrieval [61], se-mantic segmentation [5], and manipulation detection [34].Our work demonstrates that the task of image provenanceanalysis also benefits from metadata. External context cancorroborate evidence from purely visual techniques, creat-ing an overall better solution to provenance graph recon-struction.

In addition to utilizing information that cannot be de-rived from the images themselves, metadata-based ap-proaches are computationally very cheap. Furthermore,unlike complex, data-driven, vision-based techniques thatrequire large amounts of training resources, methods likeours require no training at all. Such methods can be de-ployed easily on a large scale, incurring very little perfor-mance overhead. Approaches that require large amounts oftraining data can suffer due to the relatively small sizes ofcurrently available provenance datasets. And most datasetspublished in this field so far are indeed small.

Even though external information can improve image-based approaches, provenance analysis is still far from be-ing solved. This work only presents a preliminary explo-ration of utilizing metadata in provenance analysis. Whileour results show improvement, metadata-based approacheshave higher chances of being rendered unreliable due totheir absence or manipulation. Further advancements insolving the problem must focus on the examination ofcontent-derived metadata as well. Future work could in-clude estimating missing metadata information from thecontent and available tags [27, 65]. For now, our find-ings suggest that image-content-based methods should bethe fallback option, as metadata alone is more useful fordetermining edge directions instead of edge selection. Wesurmise that going forward, the best provenance approachesshould rely primarily on image content, but utilize meta-data analysis as a secondary refinement system in scenarioswhere it is present and provides ample evidence.

References[1] U. Acar, P. Buneman, J. Cheney, J. Van Den Bussche,

N. Kwasnikowska, and S. Vansummeren. A graph modelof data and workflow provenance. In USENIX Workshop onthe Theory and Practice of Provenance, 2010.

[2] C. B. Akgul, D. L. Rubin, S. Napel, C. F. Beaulieu,H. Greenspan, and B. Acar. Content-based image retrievalin radiology: current status and future directions. SpringerJournal of Digital Imaging, 24(2):208–222, 2011.

[3] P. Alvarez. Using extended file information (exif) file head-ers in digital evidence analysis. International Journal of Dig-ital Evidence, 2, 2004.

[4] M. K. Anand, S. Bowers, and B. Ludascher. Provenancebrowser: Displaying and querying scientific workflow prove-nance graphs. In IEEE International Conference on DataEngineering, pages 1201–1204, 2010.

[5] S. Ardeshir, K. Malcolm Collins-Sibley, and M. Shah. Geo-semantic segmentation. In IEEE Conference on ComputerVision and Pattern Recognition, pages 2792–2799, 2015.

[6] K. Bahrami, A. C. Kot, L. Li, and H. Li. Blurred imagesplicing localization by exposing blur type inconsistency.IEEE Transactions on Information Forensics and Security,10(5):999–1009, 2015.

[7] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool. Speeded-uprobust features (SURF). Computer Vision and Image Under-standing, 110(3):346–359, June 2008.

[8] BBC One. Fake or Fortune? https://www.bbc.co.uk/programmes/b01mxxz6. Accessed on 06-27-2018.

[9] BBC One. Fake or Fortune? Degas and the LittleDancer. https://www.bbc.co.uk/programmes/p00xym5k, 2012.

[10] BBC One. Fake or Fortune? Lowry. https://www.bbc.co.uk/programmes/p02whms0, 2015.

[11] A. Bharati, D. Moreira, A. Pinto, J. Brogan, K. Bowyer,P. Flynn, W. Scheirer, and A. Rocha. U-phylogeny: Undi-rected provenance graph construction in the wild. In IEEEInternational Conference on Image Processing, 2017.

[12] G. K. Birajdar and V. H. Mankar. Digital image forgery de-tection using passive techniques: A survey. Elsevier DigitalInvestigation, 10(3):226–245, 2013.

[13] J. Bon. Paris Eiffel Tower. https://www.flickr.com/photos/girolame/3220601379/, 2009. Ac-cessed on 06-25-2018.

[14] M. Boutell and J. Luo. Photo classification by integratingimage content and camera metadata. In IEEE InternationalConference on Pattern Recognition, volume 4, pages 901–904. IEEE, 2004.

[15] J. Brogan, P. Bestagini, A. Bharati, A. Pinto, D. Moreira,K. Bowyer, P. Flynn, A. Rocha, and W. Scheirer. Spottingthe difference: Context retrieval and analysis for improvedforgery detection and localization. In IEEE InternationalConference on Image Processing, pages 4078–4082, 2017.

[16] P. Buneman, S. Khanna, and T. Wang-Chiew. Why andwhere: A characterization of data provenance. In SpringerInternational conference on database theory, pages 316–330, 2001.

[17] C. Chen, S. McCloskey, and J. Yu. Image splicing detectionvia camera response function analysis. In IEEE Conferenceon Computer Vision and Pattern Recognition, pages 5087–5096, 2017.

[18] C.-H. Choi, H.-Y. Lee, and H.-K. Lee. Estimation of colormodification in digital images by cfa pattern change. ElsevierForensic Science International, 226(1-3):94–105, 2013.

[19] O. Chum, J. Philbin, A. Zisserman, et al. Near duplicate im-age detection: min-hash and tf-idf weighting. In British Ma-chine Vision Conference, volume 810, pages 812–815, 2008.

[20] Computer Vision Research Lab. ND Reddit ProvenanceDataset. https://github.com/CVRL/Reddit_Provenance_Datasets, 2018. Accessed on 07-02-2018.

[21] D. Cozzolino, G. Poggi, and L. Verdoliva. Splicebuster: Anew blind image splicing detector. In IEEE InternationalWorkshop on Information Forensics and Security, pages 1–6, 2015.

[22] A. A. de Oliveira, P. Ferrara, A. De Rosa, A. Piva, M. Barni,S. Goldenstein, Z. Dias, and A. Rocha. Multiple parentingphylogeny relationships in digital images. IEEE Transac-tions on Information Forensics and Security, 11(2):328–343,2016.

[23] Z. Dias, S. Goldenstein, and A. Rocha. Toward image phy-logeny forests: Automatically recovering semantically sim-ilar image relationships. Elsevier Forensic science interna-tional, 231(1):178–189, 2013.

[24] Z. Dias, A. Rocha, and S. Goldenstein. Image phylogeny byminimal spanning trees. IEEE Transactions on InformationForensics and Security, 7(2):774–788, 2012.

[25] J. Douglas. Origins: evolving ideas about the principle ofprovenance. Currents of Archival Thinking, 24, 2010.

[26] Exiv2. Metadata reference tables. http://www.exiv2.org/metadata.html, 2017. Accessed on 06-28-2018.

[27] J. Fan, H. Cao, and A. C. Kot. Estimating exif parametersbased on noise features for image manipulation detection.IEEE Transactions on Information Forensics and Security,8(4):608–618, 2013.

[28] H. Farid. Image forgery detection. IEEE Signal ProcessingMagazine, 26(2):16–25, 2009.

[29] L. A. Gatys, A. S. Ecker, and M. Bethge. A neural algorithmof artistic style. arXiv preprint arXiv:1508.06576, 2015.

[30] C. Halaschek-Wiener, J. Golbeck, A. Schain, M. Grove,B. Parsia, and J. Hendler. Annotation and provenance track-ing in semantic web photo libraries. In Springer Interna-tional Provenance and Annotation Workshop, pages 82–89,2006.

[31] P. Harvey. Exiftool. https://www.sno.phy.queensu.ca/˜phil/exiftool/, 2018. Accessed on07-01-2018.

[32] A. D. Holan. 2016 lie of the year: Fake news.http://www.politifact.com/truth-o-meter/article/2016/dec/13/2016-lie-year-fake-news/, 2016. Accessed on 06-28-2018.

[33] H.-C. Huang and W.-C. Fang. Metadata-based image water-marking for copyright protection. Elsevier Simulation Mod-elling Practice and Theory, 18(4):436–445, 2010.

https://www.bbc.co.uk/programmes/b01mxxz6

https://www.bbc.co.uk/programmes/b01mxxz6

https://www.bbc.co.uk/programmes/p00xym5k

https://www.bbc.co.uk/programmes/p00xym5k

https://www.bbc.co.uk/programmes/p02whms0

https://www.bbc.co.uk/programmes/p02whms0

https://www.flickr.com/photos/girolame/3220601379/

https://www.flickr.com/photos/girolame/3220601379/

https://github.com/CVRL/Reddit_Provenance_Datasets

https://github.com/CVRL/Reddit_Provenance_Datasets

http://www.exiv2.org/metadata.html

http://www.exiv2.org/metadata.html

https://www.sno.phy.queensu.ca/~phil/exiftool/

https://www.sno.phy.queensu.ca/~phil/exiftool/

http://www.politifact.com/truth-o-meter/article/2016/dec/13/2016-lie-year-fake-news/



[34] M. Huh, A. Liu, A. Owens, and A. A. Efros. Fighting fakenews: Image splice detection via learned self-consistency.arXiv preprint arXiv:1805.04096, 2018.

[35] M. J. Huiskes and M. S. Lew. The mir flickr retrieval evalu-ation. In ACM International Conference on Multimedia In-formation Retrieval, pages 39–43, 2008.

[36] imgur.com. Imgur Privacy Standards. https://help.imgur.com/hc/en-us/articles/201746817-Post-privacy, 2018. Accessed on 06-25-2018.

[37] Q. Iqbal and J. Aggarwall. Applying perceptual grouping tocontent-based image retrieval: Building images. In IEEEConference on Computer Vision and Pattern Recognition,volume 1, pages 42–48, 1999.

[38] M. Iuliani, G. Fabbri, and A. Piva. Image splicing detec-tion based on general perspective constraints. In IEEE In-ternational Workshop on Information Forensics and Security(WIFS), pages 1–6, 2015.

[39] H. Jegou, M. Douze, and C. Schmid. Hamming embed-ding and weak geometric consistency for large scale imagesearch. In Springer European Conference on Computer Vi-sion, pages 304–317, 2008.

[40] J. Johnson, L. Ballan, and L. Fei-Fei. Love thy neighbors:Image annotation by exploiting image metadata. In IEEEInternational Conference on Computer Vision, pages 4624–4632, 2015.

[41] Y. Ke, R. Sukthankar, L. Huston, Y. Ke, and R. Sukthankar.Efficient near-duplicate detection and sub-image retrieval.ACM Multimedia, 4(1):1–5, 2004.

[42] E. Kee, M. K. Johnson, and H. Farid. Digital image authenti-cation from jpeg headers. IEEE Transactions on InformationForensics and Security, 6:1066–1075, 2011.

[43] J. Kruskal. On the shortest spanning subtree of a graph andthe traveling salesman problem. Proceedings of the Ameri-can Mathematical Society, 7(1):48–50, 1956.

[44] H. Li, W. Luo, Q. Rao, and J. Huang. Anti-forensicsof camera identification and the triangle test by improvedfingerprint-copy attack. arXiv preprint arXiv:1707.07795,2017.

[45] R. K. Logan, R. S. Szeliski, and M. T. Uyttendaele. Auto-matic digital image grouping using criteria based on imagemetadata and spatial information, August 2009. US Patent7,580,952.

[46] D. Lowe. Object recognition from local scale-invariant fea-tures. In IEEE Conference on Computer Vision and PatternRecognition, volume 2, pages 1150–1157, 1999.

[47] D. G. Lowe. Distinctive image features from scale-invariantkeypoints. International Journal of Computer Vision,60(2):91–110, 2004.

[48] B. Mahdian and S. Saic. A bibliography on blind methodsfor identifying image forgery. Elsevier Signal Processing:Image Communication, 25(6):389–399, 2010.

[49] F. Marra, F. Roli, D. Cozzolino, C. Sansone, and L. Ver-doliva. Attacking the triangle test in sensor-based cameraidentification. In IEEE International Conference on ImageProcessing, pages 5307–5311, 2014.

[50] J. Matas, O. Chum, M. Urban, and T. Pajdla. RobustWide Baseline Stereo from Maximally Stable Extremal Re-

gions. Elsevier Image and Vision Computing, 22(10):761–767, 2004.

[51] A. Mohdin. Is this Monet real or fake-and who gets todecide? https://qz.com/588932/is-this-monet-real-or-fake-and-who-gets-to-decide/, 2016. Accessed on 06-25-2018.

[52] D. Moreira, A. Bharati, J. Brogan, A. Pinto, M. Parowski,K. W. Bowyer, P. J. Flynn, A. Rocha, and W. J. Scheirer.Image provenance analysis at scale. arXiv preprintarXiv:1801.06510, 2018.

[53] National Institute of Standards and Technology. MediS-core:Scoring tools for Media Forensics Evaluations.https://github.com/usnistgov/MediScore,2017. Accessed on 07-01-2018.

[54] National Institute of Standards and Technology.Nimble Challenge 2017 Evaluation. https://www.nist.gov/itl/iad/mig/nimble-challenge-2017-evaluation, 2017. Accessedon 10-05-2017.

[55] National Institute of Standards and Technology. NimbleChallenge 2017 Evaluation Plan. https://www.nist.gov/sites/default/files/documents/2017/09/07/\nc2017evaluationplan_20170804.pdf, 2017. Accessed on 06-18-2018.

[56] A. Oliveira, P. Ferrara, A. De Rosa, A. Piva, M. Barni,S. Goldenstein, Z. Dias, and A. Rocha. Multiple parent-ing identification in image phylogeny. In IEEE InternationalConference on Image Processing, pages 5347–5351, 2014.

[57] J.-Y. Pan, H.-J. Yang, P. Duygulu, and C. Faloutsos. Auto-matic image captioning. In IEEE International Conferenceon Multimedia and Expo, volume 3, pages 1987–1990, 2004.

[58] A. Pinto, D. Moreira, A. Bharati, J. Brogan, K. Bowyer,P. Flynn, W. Scheirer, and A. Rocha. Provenance filtering formultimedia phylogeny. In IEEE International Conference onImage Processing, pages 1502–1506, 2017.

[59] M. Pskowski. Mexico struggles to weed out fake news aheadof its biggest election ever. https://www.theverge.com/2018/6/27/17503444/mexico-election-fake-news-facebook-twitter-whatsapp, 2018.Accessed on 06-28-2018.

[60] Reddit.com. Photoshopbattles. https://www.reddit.com/r/photoshopbattles/, 2017. Accessed on 08-11-2017.

[61] S. Sasikala and R. S. Gandhi. Efficient content based im-age retrieval system with metadata processing. InternationalJournal of Innovative Research in Science and Technology,1(10):72–77, 2015.

[62] Y. L. Simmhan, B. Plale, and D. Gannon. A survey ofdata provenance techniques. ftp://ftp.extreme.indiana.edu/pub/techreports/TR618.pdf,2005. Accessed on 06-13-2018.

[63] R. D. Spencer and G. D. Sesser. Provenance:Important, Yes, But Often Incomplete and OftenEnough, Wrong. https://news.artnet.com/market/the-importance-of-provenance-in-determining-authenticity-29953, 2013.Accessed on 06-22-2018.

https://help.imgur.com/hc/en-us/articles/201746817-Post-privacy



https://qz.com/588932/is-this-monet-real-or-fake-and-who-gets-to-decide/



https://github.com/usnistgov/MediScore

https://www.nist.gov/itl/iad/mig/nimble-challenge-2017-evaluation



https://www.nist.gov/sites/default/files/documents/2017/09/07/\nc2017evaluationplan_20170804.pdf




https://www.theverge.com/2018/6/27/17503444/mexico-election-fake-news-facebook-twitter-whatsapp



https://www.reddit.com/r/photoshopbattles/

https://www.reddit.com/r/photoshopbattles/

ftp://ftp.extreme.indiana.edu/pub/techreports/TR618.pdf

ftp://ftp.extreme.indiana.edu/pub/techreports/TR618.pdf

https://news.artnet.com/market/the-importance-of-provenance-in-determining-authenticity-29953



[64] The Metropolitan Museum of Art. The Met’s ProvenanceResearch Project. https://www.metmuseum.org/about-the-met/policies-and-documents/provenance-research-project, 200. Accessed on06-22-2018.

[65] C.-M. Tsai, A. Qamra, E. Y. Chang, and Y.-F. Wang. Ex-tent: Inferring image metadata from context and content.In IEEE International Conference on Multimedia and Expo,pages 1270–1273, 2005.

[66] Wikimedia Commons. Las Vegas’ Eiffel Tower.https://commons.wikimedia.org/wiki/File:Eiffel_Tower_in_Las_Vegas_(Paris)_at_night.jpg, 2015. Accessed on 06-25-2018.

[67] K.-P. Yee, K. Swearingen, K. Li, and M. Hearst. Facetedmetadata for image search and browsing. In ACM SIGCHIConference on Human Factors in Computing Systems, pages401–408, 2003.

[68] S. Zagoruyko and N. Komodakis. Learning to compare im-age patches via convolutional neural networks. In IEEE Con-ference on Computer Vision and Pattern Recognition, pages4353–4361, 2015.

[69] X. Zhang, X. Zhou, M. Lin, and J. Sun. Shufflenet: An ex-tremely efficient convolutional neural network for mobile de-vices. In IEEE Conference on Computer Vision and PatternRecognition, 2018.

https://www.metmuseum.org/about-the-met/policies-and-documents/provenance-research-project



https://commons.wikimedia.org/wiki/File:Eiffel_Tower_in_Las_Vegas_(Paris)_at_night.jpg



arxiv:1807.03376v2 [cs.cv] 15 jul 2018tion [19 ,41], image splicing detection [21 6 38 17 34 15] and...

Documents