mediaeval 2016: a hybrid approach for verifying multimedia use on twitter
TRANSCRIPT
AHYBRIDAPPROACHFORVERIFYINGMULTIMEDIAUSEONTWITTERQuoc-TinPhan,AlessandroBudroni,CeciliaPasquini,FrancescoG.B.DeNatale
DepartmentofInformationEngineeringandComputerScience– UniversityofTrento,Italy
INTRODUCTION
Canyourecognize?Theyare“FAKE”.
EXISTINGAPPROACHES
THEPROPOSEDMETHOD
Schemaoftheproposedmethod.
MULTIMEDIAASSESSMENT
1. Search by keywords: online web search using relevant keywords associated to theevent.
2. Search by image/video: Google reverse image search and comment retrieval fromYouTube.
3. Forensic feature extraction: Non-Aligned Double JPEG Compression, Block ArtifactGrid, and Error Level Analysis. We seek for highest-probable blocks which mayundergo modifications and extract statistical features as min, max, mean, andvariance.
4. Textual feature extraction:4.1 Extract most relevant terms from results of 1. as bag-of-words.4.2 From results of 2., calculate term frequency of bag-of-words from 4.1.4.3 Calculate term frequency of bag of negative, positive and “fake” words.4.4 Concatenate features from 4.2 and 4.3 to form textual features.
5. Textual features together with forensic features are fed to Classifier 1.
RESULTSANDDISCUSSIONS
MultimediaSignalProcessingandUnderstandingLab,UniversityofTrento,Italy
✘Notusefulwithshorttextand multiplelanguages.
✘Nottakeintoaccountmultimediacontent.
HurricaneSandy
sharing
sharing
faketopic
realtopic
UnreliableinformationabouteventsandnewssharingoverOnlineSocialNetworksmightcausenegativeconsequences oncommunity
GIVEN:ATWEETcomprising<text, images/video>
REAL/FAKE
INPUT
OUTPUT
SYNTHETIC MANIPULATION
Text-based
Multimedia-Forensic-based
User-based
✘Sensitivetosubsequentmodifications andcompression.
Multimedia
Event
Post
User
Forensicfeatureextraction
Searchbyimage/video
Searchbykeywords
Forensicfeatures
Textualfeatures
Textualfeatureextraction
Classifier 1
Post-basedfeatures
User-basedfeatures
Classifier 2Concatenate
ScorefusionFinal
decision
Post-based featureextraction
User-based featureextraction
Concatenate
Multimediaassessment
Tweetcredibilityassessment
TWEETCREDIBILITYASSESSMENT
1. Post-based feature extraction: useful features reflecting the credibility of a tweetpost are extracted, i.e. whether the tweet contains “?” or “!”, number of negativesentiment words.
2. User-based feature extraction: useful features reflecting the credibility of a user areextracted, i.e. number of followers the user has, whether the user is verified byTwitter.
3. Post-based features together with user-based features are fed to Classifier 2.
WRONGCONTEXT
SCOREFUSION
With the assumption that a tweet sharing fake images or videos is likely to be fake,higher weight is assigned to the output from Classifier 1, lower weight is assigned to theoutput from Classifier 2.
In the sub-task, we submitted RUN 1 applying only forensic features,and RUN 2 applying both textual features and forensic features.
In the main task, we submitted three RUNs: i) RUN 1: applied only the secondclassification tier, ii) RUN 2: applied two-tier classification and 0.8 : 0.2 fusion strategy,answered UNKNOWN to cases suffered from online searching errors, iii) RUN 3: same asRUN 2, considered the output of classification tier 2 instead of UNKNOWN.
The proposed method is subject to online search errors, which happen to videos NOThosted by Youtube.
Recall Precision F1-score
RUN1 0.5 0.48 0.49RUN2 0.93 0.49 0.64
Our method gains recall if we take into account textual featuresacquired from online text search and image reverse search. Thisapproach effectively reduces false negative rate.
Recall Precision F1-score
RUN1 0.55 0.71 0.62RUN2 0.94 0.81 0.87RUN 3 0.94 0.74 0.83