techniques for automating quality assessment of context-specific content on social media services

58
Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services Prateek Dewan PhD Thesis Defense November 14, 2017 [email protected] Committee members Dr. Alessandra Sala Dr. Sanasam Ranbir Singh Dr. Aditya Telang Dr. Ponnurangam Kumaraguru (Advisor)

Upload: precog

Post on 22-Jan-2018

211 views

Category:

Engineering


0 download

TRANSCRIPT

Page 1: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

TechniquesforAutomatingQualityAssessmentofContext-specificContentonSocialMediaServices

Prateek DewanPhDThesisDefense

November14,2017

[email protected]

CommitteemembersDr.AlessandraSala

Dr.Sanasam Ranbir Singh

Dr.AdityaTelang

Dr.Ponnurangam Kumaraguru (Advisor)

Page 2: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

WhoamI?

• DataScientistatApple• PhDstudentsinceFebruary,2012– IIIT-Delhi• Masters(2010– 2012), IIIT-Delhi

• Collaborations• IBMIRL(DelhiandBengaluru), SymantecResearchLabs(Pune), DublinCityUniversity(Ireland),UFMG(Brazil)

• WorkedinPrivacyandSecurityonOnlineSocialMedia

• Researchinterests• AppliedMachineLearning

• NaturalLanguageProcessing• WebSecurity

2

Page 3: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

OnlineSocialMedia:TheBigPicture

3

Page 4: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

“Withgreatpowercomesgreatresponsibility”

4

Page 5: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Thesisstatement

• Todesignandevaluateautomatedtechniquesforqualityassessmentofcontext-specificcontentonsocialmediaservicesinrealtime

• Focus:Facebook• BiggestOnlineSocialMediaservice

• 2.01billionmonthlyactiveusers

• Every2outof7humanbeingsontheplanetusesFacebook

• Mostsought-afterOSNfornews

5

Page 6: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

ProposedSolution

6

Identify Characterize Model

PrototypeDeployEvaluate

Page 7: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

FacebookInspector:Demo

7

Page 8: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Scope

• Establishingthedefinitionofpoorqualitycontent•Whatallcontentispoorinquality?• Untrustworthy• Childunsafe• Misleadinginformation

• Hoaxes,scams,clickbait

• Violence,hatespeech• Definitionconformingto• Facebook’scommunitystandards1

• Definitionsofpagespam

81https://www.facebook.com/communitystandards

Page 9: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Approach

•Poorqualityposts published onFacebook• Facebook pages publishing poorqualitycontent•Misinformation spreadonFacebookthroughimages

Characterize

•GroundtruthextractionusingURLblacklists, andhumanannotation

•Experimentswithmultiple supervised learningtechniques

•Two-foldmodeltoidentifymalicious contentinrealtimeModel

•FacebookInspector (FbI)Architecture

• Livedeployment viaRESTAPIandbrowserplug-ins forChromeandFirefox

•3,000+downloads, 180+dailyactiveusers, 1 million+postsanalyzed

•Evaluation intermsofresponse time,performance,andusability

Implement

9

Page 10: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Approach

• Poorqualityposts publishedonFacebook•Facebook pages publishing poorqualitycontent•Misinformation spreadonFacebookthroughimages

Characterize

•GroundtruthextractionusingURLblacklists, andhumanannotation

•Experimentswithmultiple supervised learningtechniques

•Two-foldmodeltoidentifymalicious contentinrealtimeModel

•FacebookInspector (FbI)Architecture

• Livedeployment viaRESTAPIandbrowserplug-ins forChromeandFirefox

•3,000+downloads, 180+dailyactiveusers, 1 million+postsanalyzed

•Evaluation intermsofresponse time,performance,andusability

Implement

10

Page 11: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Dataset

DataType Quantity

Uniqueposts 4,465,371

Uniqueentities 3,373,953

Uniqueusers 2,983,707

Uniquepages 390,246

UniqueURLs 480,407

Uniquepostswithoneormore URLs 1,222,137

UniqueentitiespostingURLs 856,758

UniquepostswithoneormoremaliciousURLs 11,217

Uniqueentitiespostingone ormoremaliciousURLs 7,962

Unique maliciousURLs 4,622

11

Page 12: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

EstablishingGroundTruth

• ExtractedpostscontainingoneormoreURLs• 1.2millionoutof4.4millionpostsintotal

• 480kuniqueURLs• UsedsixURLblacklists• GoogleSafebrowsing (malware/phishing)• VirusTotal (spam/malware/phishing)• Surbl (spam)• WebofTrust(trustscore)*

• SpamHaus (spam)• Phishtank (phishing)

• PostcontainingoneormoreblacklistedURLmarkedaspoorqualityposts (11,217inall)

12

Page 13: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

WebofTrust

13

Reputation:Unsatisfactory/Poor/Verypoor (lessthan60)Confidence:High(greaterthan10)

ORCategory:Negative

Malicious

http://www.domain.com

Page 14: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Findings

• Facebook’scurrenttechniquesdonotsuffice• 65%ofallpoorqualitypostsexistedonFacebookafter4(ormore)months• Gatheredlikes from52,169uniqueusers;comments from8,784uniqueusers

• Facebook’spartnershipwithWebofTrust?• 88%ofallmaliciousURLshadpoorreputationonWOT

• Nowarningpages

14

Page 15: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Platformsusedtopost

15

Page 16: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Distributionofpoorqualityposts

16

Pages Users

Entities Posts

Page 17: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Approach

•Poorqualityposts published onFacebook• Facebook pages publishingpoorqualitycontent•Misinformation spreadonFacebookthroughimages

Characterize

•GroundtruthextractionusingURLblacklists, andhumanannotation

•Experimentswithmultiple supervised learningtechniques

•Two-foldmodeltoidentifymalicious contentinrealtimeModel

•FacebookInspector (FbI)Architecture

• Livedeployment viaRESTAPIandbrowserplug-ins forChromeandFirefox

•3,000+downloads, 180+dailyactiveusers, 1 million+postsanalyzed

•Evaluation intermsofresponse time,performance,andusability

Implement

17

Page 18: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

FacebookPagespostingpoorqualitycontent

18

HidinginPlainSight:CharacterizingandDetectingMaliciousFacebookPages. Prateek Dewan,Shrey Bagroy,andPonnurangamKumaraguru (Shortpaper).PublishedatIEEE/ACMConferenceonAdvancesinSocialNetworksAnalysisandMining(ASONAM), San

Francisco,USA.2016.

Page 19: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

GroundTruthextraction:Facebookpages

4.4millionposts

10,341maliciousposts

(1,557pages;5,868users)

627malicious

pages

19

1ormoremaliciousURLsin

themostrecent100posts

Page 20: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Datasetofpages postingpoorqualitycontent

WOTresponse No.ofpages No. ofposts

Childunsafe 387 10,891

Untrustworthy 317 8,057

Questionable 312 8,859

Negative 266 5,863

Adult content 162 3,290

Spam 124 4,985

Phishing 39 495

Total 627(31) 20,999

20

• NumbersinbracketsareVerifiedpages

Page 21: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Contentanalysis(pagenames)

21

• SentenceTokenizationàWordTokenizationà CasenormalizationàStemmingà Stopword removal

• N-gramanalysis(n=1,2,3)

• Politicallypolarizedentitiesamongstpoorqualitypages• BritishNationalParty(BNP),TheTeaParty,EnglishDefenseLeague,AmericanDefenseLeague,AmericanConservatives,GeertWilderssupporters…

Page 22: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Networkanalysis

22

• Collusivebehaviorwithinpages postingpoorqualitycontent

Shares LikesComments

Page 23: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Temporalactivity

• Activityratio:"#.#%&'()*"'&+,-&'.)&#&,/"#.#%&'()*"'&+ duringcompleteobservationperiod

• Maliciouspagesaremoreactivethanbenignpages

23

Page 24: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Approach

•Poorqualityposts published onFacebook• Facebook pages publishing poorqualitycontent• MisinformationspreadonFacebookthroughimages

Characterize

•GroundtruthextractionusingURLblacklists, andhumanannotation

•Experimentswithmultiple supervised learningtechniques

•Two-foldmodeltoidentifymalicious contentinrealtimeModel

•FacebookInspector (FbI)Architecture

• Livedeployment viaRESTAPIandbrowserplug-ins forChromeandFirefox

•3,000+downloads, 180+dailyactiveusers, 1 million+postsanalyzed

•Evaluation intermsofresponse time,performance,andusability

Implement

24

Page 25: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Why?:TheHumanBrain- Imagesversustext

• Humanbrainprocessesimages60,000timesfasterthantext

25

Page 26: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Arewedoingenoughto"understand" images?

• Mostresearchtoanalyzesocialmediacontentfocusesontext• Topicmodelling

• Sentimentanalysis

• Doesitcaptureeverything?• Studiesrelatedtoimagesarelimitedtosmallscale• Fewhundred imagesmanuallyannotatedandanalyzed

• Whatcanbedone?• Automated techniquesforimagesummarization;DeepLearningandConvolutionalNeuralNetworks(CNNs)toscaleacrosslargeno.ofimages

• Domaintransferlearning

• OpticalCharacterRecognition

26

Page 27: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Methodology

• ImagespostedonFacebookduringtheParisAttacks,November2015

• 3-tierpipelineforextractinghighlevelimagedescriptorsfromimages

27

Uniqueposts 131,548

Unique users 106,275

Postswithimages 75,277

Total imagesextracted 57,748

Totaluniqueimages 15,123

Images

Themes(Inceptionv3)

ImageSentiment(DeCAF trainedon

SentiBank)

OpticalCharacterRecognition

Humanunderstandabledescriptors

TextSentiment(LIWC) +Topics(TF)

Manualcalibration

Tier1:VisualThemes

Tier2:ImageSentiment

Tier3:Textembeddedinimages

Page 28: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

TierI:VisualThemes

• ImageNetLargeScaleVisualRecognitionChallenge(ILSVRC),2012• 1.2millionimages,1,000categories

•Winner:Google’sInception-v3(top-1error:17.2%)• 48-layerDeepConvolutionalNeuralNetwork

28

Page 29: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

TierI:VisualThemescontd.

• AllimageslabeledusingInception-v3

• Validation:• Randomsampleof2,545imagesannotatedby3humanannotators

• 38.87%accuracy(majorityvoting)

•Manualcalibration• Renamed7outofthetop30(mostfrequentlyoccurring)labels

• Newaccuracy:51.3%•Whyrename?à

29

BoloTie

(Inception-v3)

PeaceForParis

(Ourdataset)

Page 30: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

TierII:ImageSentiment

• DomainTransferLearning

• Inception-v3’slastlayerretrainedusingSentiBank• SentiBank• ImagescollectedfromFlickrusingAdjectiveNounPairs(ANPs)assearchquery

• ANPs:happydog,adorablebaby,abandonedhouse• Weaklylabeleddatasetofimagescarryingemotion

• Finaltrainingset– 133,108negative+305,100positivesentimentimages

• 10-foldrandomsubsampling

• 69.8% accuracy

30

Page 31: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

TierIII:Textembeddedinimages

• OpticalCharacterRecognition(OCR)• TesseractOCR(Python)

• 31,689imageshadtext

• Manuallyextractedtextfromarandomsampleof1,000images

• ComparedwithOCRoutputusingstringsimilaritymetrics

• ~62%accuracy

31

Tesseractoutput:

No-onethinksthatthesepeoplearerepresentativeofChristians.SowhydosomanythinkthatthesepeoplearerepresentativeofMuslims?

Page 32: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Imageandposttexthaddifferenttopics

• Textembeddedinimagesdepictedmorenegativesentimentthanusergeneratedtextualcontent

32

Textembedded inimages Usergeneratedtext

Page 33: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Sentiment:Imagesversustext

• Imagesentimentwasmorepositivethantextsentiment

33

0

0.1

0.2

0.3

0.4

0.5

0.6

8 24 40 56 72 88 104 120 136 152 168 184 200 216 232 248 264 280

Sentim

entValue

/Vo

lumeFractio

n

No.ofhoursaftertheattacks

PostText ImageTextImage VolumeFraction

Page 34: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Poorqualityimagecontent popularonFacebook

34

Page 35: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Approach

•Poorqualityposts published onFacebook• Facebook pages publishing poorqualitycontent•Misinformation spreadonFacebookthroughimages

Characterize

•GroundtruthextractionusingURLblacklists, andhumanannotation

•Experimentswithmultiple supervised learningtechniques

•Two-foldmodeltoidentifymalicious contentinrealtimeModel

•FacebookInspector (FbI)Architecture

• Livedeployment viaRESTAPIandbrowserplug-ins forChromeandFirefox

•3,000+downloads, 180+dailyactiveusers, 1 million+postsanalyzed

•Evaluation intermsofresponse time,performance,andusability

Implement

35

Page 36: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Revisiting-- EstablishingGroundTruth

• ExtractedpostscontainingoneormoreURLs• 1.2millionoutof4.4millionpostsintotal

• 480kuniqueURLs• UsedsixURLblacklists• GoogleSafebrowsing (malware/phishing)• VirusTotal (spam/malware/phishing)• Surbl (spam)• WebofTrust(trustscore)*

• SpamHaus (spam)• Phishtank (phishing)

• PostcontainingoneormoreblacklistedURLmarkedaspoorqualityposts (11,217inall)

36

Page 37: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

GroundTruthextraction– DatasetII

•WhatifapostdoesnothaveaURL?

• 500randomFacebookpostsx17eventsx3annotators

• Definitionofmaliciouspost• “AnyirrelevantorunsolicitedmessagessentovertheInternet,typicallytolargenumbersofusers,forthepurposesofadvertising,phishing,spreadingmalware,etc.arecategorizedasspam.Intermsofonlinesocialmedia,socialspamisanycontentwhichisirrelevant/unrelatedtotheeventunderconsideration,and/oraimedatspreadingphishing,malware,advertisements,selfpromotionetc.,includingbulkmessages,profanity, insults,hatespeech,maliciouslinks,fraudulentreviews,scams,fakeinformationetc.”

• Finaldataset(all3annotatorsagreedonthesamelabel)• 571maliciousposts

• 3,841benignposts

37

Page 38: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Featureset:FacebookPosts

Source Features

Entity (9) isPage, gender,pageCategory,hasUsername,usernameLength,

nameLength,numWordsInName, locale,pageLikes

Textualcontent

(18)

Presenceof!,?,!!,??, emoticons(smile,frown),numWords,

avgWordLength,numSentences,avgSentenceLength,

numDictionaryWords,numHashtags,hashtagsPerWord,numCharacters,

numURLs,URLsPerWord,numUppercaseCharacters,numWords /

numUniqueWords

Metadata(10) Application,Presence offacebook.com URL,Presenceof

apps.facebook.com URL,PresenceofFacebookeventURL,hasMessage,

hasStory,hasPicture,hasLink,type, linkLength

Link(7) http/https,numHyphens, numParameters,avgParameterLength,

numSubdomains, pathLength

38

Page 39: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Supervisedlearning:DatasetI

Classifier/Features

Entity Text Metadata Link All Top 7

NaïveBayes 54.79 52.41 71.60 69.25 56.15 74.72

DecisionTree 63.02 64.78 80.56 82.34 84.67 86.17

RandomForest 63.47 66.25 80.67 82.56 85.05 86.62

SVMrbf 61.77 64.89 78.75 81.45 75.89 83.66

39

Page 40: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Supervisedlearning:DatasetII

Classifier/Features

Entity Text Metadata Link All

NaïveBayes 51.67 51.60 72.45 77.58 67.63

DecisionTree 51.66 73.16 79.01 81.04 76.17

RandomForest 52.86 76.56 79.87 81.49 80.56

SVMrbf 53.16 76.52 78.18 80.37 73.79

40

Page 41: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Featureset:FacebookPages

Pagefeatures Likes,talking about,descriptionlength,bio,category,name,location,check-ins,…

Postingbehavior

Dailyactivityratio,posttypes,postlikes,postcomments,postshares,postengagementratio,postlanguage,averagepostlength,no.ofuniqueURLsinposts,no.ofuniquedomainsinposts,etc.

41

• Supervised learning• Page+postfeatures• 55featuresfrompageinformation

• 41featuresfrompostingbehavior

• Bagofwords• Contentgeneratedbypages

Page 42: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Supervisedlearning:Page+postfeatures

Classifier Featureset Accuracy(%) ROCAUC

NaïveBayesian

Page 63.95 0.685

Post 69.61 0.753

Page+Post 70.81 0.776

LogisticRegression

Page 67.38 0.745

Post 76.55 0.825

Page+Post 76.71 0.846

DecisionTrees

Page 65.55 0.668

Post 71.37 0.720

Page+Post 70.81 0.758

Random Forest

Page 67.86 0.750

Post 74.95 0.829

Page+Post 75.27 0.83742

Page 43: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Supervisedlearning:Bagofwords

Classifier Featureset Accuracy (%) ROCAUC

NaïveBayesian

Unigrams 68.27 0.682

Bigrams 69.06 0.690

Trigrams 69.77 0.697

LogisticRegression

Unigrams 74.18 0.795

Bigrams 74.34 0.791

Trigrams 73.93 0.789

Decision Trees

Unigrams 68.12 0.678

Bigrams 67.05 0.678

Trigrams 66.63 0.672

RandomForest

Unigrams 72.26 0.794

Bigrams 71.80 0.802

Trigrams 72.18 0.794

Sparse NN

Unigrams 81.74 0.862

Bigrams 84.12 0.872

Trigrams 84.13 0.90043

Page 44: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Modelforrealtimedetection

•Modelforpagesdependsonpostspublishedbypages• Can’tbeusedfordetectioninrealtime

• Twofoldsupervisedlearningbasedmodelusingpostfeatures

• Utilizingclassprobabilitiesfordecisionmaking

44

Page 45: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Decisionboundary

45Classifier1

Classifier2

1

10

High

High

LowMalicious

Benign

Page 46: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Approach

•Poor qualityposts published onFacebook• Facebook pages publishing poorqualitycontent•Misinformation spreadonFacebookthroughimages

Characterize

•GroundtruthextractionusingURLblacklists, andhumanannotation

•Experimentswithmultiple supervised learningtechniques

•Two-foldmodeltoidentifymalicious contentinrealtimeModel

•FacebookInspector (FbI)Architecture

• Livedeployment viaRESTAPIandbrowserplug-ins forChromeandFirefox

•3,000+downloads, 180+dailyactiveusers, 1 million+postsanalyzed

•Evaluation intermsofresponse time,performance,andusability

Implement

46

Page 47: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

FacebookInspector(FbI):Architecture

47

Page 48: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

FbI stats

Dateofpublic launch August23,2015

Total IncomingRequests 9million+

Total publicpostsanalyzed 3.5million+

Totaldownloads 5,000+

Dailyactiveusers 250+

Totaluniquebrowsers 1,250+

Postsmarkedasmalicious 615,000+

Postsmarkedasbenign 2.9million+

48

Page 49: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

FbI evaluation:Responsetime

49

• ~80%postsprocessedwithin3seconds

• Averagetimeperpost:2.635seconds

Page 50: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

FbI evaluation:Usability

• Usabilitystudywith53participants• SUSscore:81.36(Agrade)• Higherperceivedusabilitythat>90%ofallsystemsevaluatedusingSUSscale

• 98.1%participantsfoundFbI “easytouse”• 67.9%participantswouldlikeuseFbI frequently• Quotesfromusers:• “Savesyourtimespentonspamlinksandhenceenhancesuserexperience.”• “[FacebookInspector]Canbeusefulforminorsandpeoplewholackthejudgementtodecidehowthepostis.”

50

Page 51: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Contributionssummary

• IdentifiedandcharacterizedpoorqualitycontentspreadonFacebook,withthepurposeofidentifyingpoorqualitypostspublishedduringnews-makingeventsinrealtime

• Evaluated supervisedlearningapproachesforidentifyingpoorqualitypostsonFacebookinrealtime,usingentity,textual,metadata,andURLfeatures

• Deployedandevaluated anovelframeworkandsystemforrealtimedetectionofpoorqualitypostsonFacebookduringnews-makingevents

51

Page 52: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Howdoesithelp?

• SocialmediaservicesaretheprimarysourceofinformationformajorityofInternetusers• Contentisunmoderatedandcrowd-sourced;everythingyouseemaynotbetrue

• FacebookInspectorprovidesausefulandusablerealworldsolution toassistusers

• Methodologyforfastandaccuratesummarizationofimagedatasetspertainingtoagiventopic• Governmentagencies/brandscanusethismethodology toquicklyproducehigh-levelsummariesofevents/productsandgaugethepulseofthemasses

52

Page 53: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Realworldimpact

• RealtimesystemFacebookInspectorbuilttoidentifypoorqualitycontentisusedby250+Facebookusers,andhasprocessedover9millionrequests

• AuniquedatasetofFacebookpostscontainingmaliciousURLs,pagespostingmaliciouscontent,andimagesdepictingmisinformationfrom20+news-makingevents

53

Page 54: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Limitationsandfuturework

• Currentsystemdoesnotincorporateuserfeedback• Wewould liketoenableuserstoprovide feedbacktomakeamorepersonalizeddetectionmodel

• Computervisiontechniqueshavelimitedaccuracyonsocialmediacontent• Objectdetection,sentimentanalysis,andopticalcharacterrecognitiontechniquesweusedarenottestedthoroughlyonsocialmediacontent

• Identifyandrankusersonthebasisofdegreeofmalice• Moremaliciouscontentgenerated,highertheranking

54

Page 55: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Acknowledgements

• NIXIfortravelsupport(eCRS,2014)• IIIT-Delhi fortravelsupport(ASONAM,2017)

• Govt.ofIndiaforfundingduringPhD• Collaboratorsandco-authors:Dr.Anand Kashyap,Shrey Bagroy,Anshuman Suri,VarunBharadhwaj,AditiMithal

• Monitoringcommittee:Dr.Vinayak andDr.Sambuddho

• Peers:Dr.Niharika Sachdeva,Anupama Aggarwal,Dr.Paridhi Jain,Dr.AditiGupta,Srishti Gupta,Rishabh Kaushal

• MembersofPrecog@IIITD andCERC

• Everyoneelsewhohasbeenpartofmyjourney…

55

Page 56: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Publications– Partofthesis

• Dewan,P.,Bagroy,S.,andKumaraguru,P.HidinginPlainSight:TheAnatomyofMaliciousPagesonFacebook.Bookchapter,LectureNotesinSocialNetworks,Springer2017(Toappear)

• Dewan,P.,Suri,A.,Bharadhwaj,V.,Mithal,A.,andKumaraguru,P.TowardsUnderstandingCrisisEventsOnOnlineSocialNetworksThroughPictures.IEEE/ACMInternationalConferenceonAdvancesinSocialNetworksAnalysisandMining(ASONAM),2017.

• Dewan,P.,andKumaraguru,P.FacebookInspector(FbI):TowardsAutomaticRealTimeDetectionofMaliciousContentonFacebook.SocialNetworkAnalysisandMiningJournal(SNAM),2017.Volume7,Issue1.

• Dewan,P.,Bagroy,S.,andKumaraguru,P.HidinginPlainSight:CharacterizingandDetectingMaliciousFacebookPages.IEEE/ACMInternationalConferenceonAdvancesinSocialNetworksAnalysisandMining(ASONAM),2016(Shortpaper)

• Dewan,P.,andKumaraguru,P.TowardsAutomaticRealTimeIdentificationofMaliciousPostsonFacebook.ThirteenthAnnualConferenceonPrivacy,SecurityandTrust(PST),2015

• Dewan,P.,Kashyap,A.,andKumaraguru,P.AnalyzingSocialandStylometric FeaturestoIdentifySpearphishingEmails.APWGeCrime ResearchSymposium(eCRS),2014

56

Page 57: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Publications– Other

• Kaushal,R.,Chandok,S.,JainP., Dewan,P.,Gupta,N.,andKumaraguru,P.NudgingNemo:HelpingUsersControlLinkability acrossSocialNetworks.9thInternationalConferenceonSocialInformatics(SocInfo),2017(Shortpaper).

• Deshpande,P.,Joshi,S., Dewan,P.,Murthy,K.,Mohania,M.,Agrawal,S.TheMaskofZoRRo:preventinginformationleakagefromdocuments.KnowledgeandInformationSystemsJournal,2014

• Mittal,S.,Gupta,N., Dewan,P.,Kumaraguru,P.Pinnedit!AlargescalestudyofthePinterestnetwork.1stACMIKDDConferenceonDataSciences(CoDS),2014

• Dewan,P.,Gupta,M.,Goyal,K.,andKumaraguru,P.MultiOSN:Realtime MonitoringofRealWorldEventsonMultipleOnlineSocialMediaIBMICARE2013

• Magalhães,T.,Dewan,P.,Kumaraguru,P.,Melo-Minardi,R.,andAlmeida,V.uTrack:TrackYourself!MonitoringInformationonOnlineSocialMedia.22ndInternationalWorldWideWebConference(WWW)(2013)

• ConwayM., DewanP.,Kumaraguru P.,McInerney L.'WhitePrideWorldwide':AMeta- analysisofStormfront.orgInternet,Politics,Policy2012:BigData,BigChallenges?,OxfordInternetInstitute,UniversityofOxford.

57

Page 58: Techniques for Automating Quality Assessment of Context-specific Content on Social Media Services

Thankyou!

[email protected]

http://precog.iiitd.edu.in/people/prateek