demo - precog research groupprecog.iiitd.edu.in/publications_files/sonal_defense_slides.pdf ·...
TRANSCRIPT
cerc.iiitd.ac.in
Demo
1
Image Retrieval for Improved Law and Order
Search, Analyse, Predict Image Spread on Twitter
Sonal Goel (MT14026)Advisor: Dr. Ponnurangam Kumaraguru
Cybersecurity Education and Research Centre (CERC)
cerc.iiitd.ac.in
Thesis committee
Dr. AV Subramanyam, IIIT-Delhi
Dr. Samarth Bharadwaj, IBM-IRL
Dr. Ponnurangam Kumaraguru (Chair), IIIT-Delhi
3
cerc.iiitd.ac.in
Recent events
4
cerc.iiitd.ac.in
Baba Ram Rahim posing as Lord Vishnu
5
cerc.iiitd.ac.in
Doctored picture of the PM
6
cerc.iiitd.ac.in
Social media impact
Before After
Limited reach Exponential reach
Time lapse - high Time lapse - ~nil
Localised issues Globalised/ Nationalised issues
7
How news spread before and after arrival of Social media
cerc.iiitd.ac.in
Problem
8
Disturbing Events
Social Media
a. Religious
b. Caste
c. Communal
d. Political
e. ....
Law & Order
cerc.iiitd.ac.in
Research Aim
9
A real time image search system for security analysts to monitor the spread of an image, analyse the users spreading the content, the sentiments propagating.
To predict the spread of an image
Real Time Image Search
OSINT
cerc.iiitd.ac.in
Current State of Art
Content based image retrieval (CBIR)- Colour, texture, size, shape: A combination of
color and shape is a more robust feature than individual
- Identify the keypoints using SIFT, SURF, or ORB: Rublee showed that ORB is two orders magnitude faster than SIFT, while performing as well in most situations
10
• Jain, Anil K., and Aditya Vailaya. "Image retrieval using color and shape."Pattern recognition 29.8 (1996).
• Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. Orb: an efficient alternative to sift or surf. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 2564–2571. IEEE, 2011.
cerc.iiitd.ac.in
Contd..
Real-time image based search systems:- Google Reverse Image
- TinEye
Absence of real-time image retrieval for micro-blogging sites to aid security analysts
11
• https://www.tineye.com/• https://images.google.com/
cerc.iiitd.ac.in
Contributions
The system is currently being used by a Government security agency to analyse image spread on micro-blogging sites.
Robust to retrieve modified images.
High level image features like presence of face is are better to predict retweet count than low level image features like colour intensity of image.
12
cerc.iiitd.ac.in 13
AIM
Image Retrieval System
Predicting the Spread of an image
cerc.iiitd.ac.in
Architecture diagram
14
Text
Image
Keywords
Image DB
Similar images
Image ComparisonMethodology
Text Mining
Collects tweetscontaining images
RESTAPI
cerc.iiitd.ac.in 15
1. 2.
4.
3.
5.
cerc.iiitd.ac.in 16
Details of data collected
Event Name Total images
Total images Similar set
Total images dissimilar set
RamRahim 411 99 312
Kulkarni 1912 348 1564
ShaniShingnapur 183 67 116
KejriwalInsultsHanuman
666 278 388
CharlieHebdo 570 114 456
cerc.iiitd.ac.in
Challenges in finding similar images
Images can be cropped
Scaled images
Changed colour, brightness
Text Added
Images stitched with other images
Rotated
17
cerc.iiitd.ac.in
Image features to look for similarity
Colour distribution: 3D-colour histogram
Keypoint descriptors
a. Daisy
b. ORB (Oriented FAST and Rotated BRIEF)
c. Improved ORB (ORB+RANSAC)
RANSAC (Random Sampling Consensus)
18
cerc.iiitd.ac.in
Histogram
• Different accuracy at different distance for every event• Avg variance at 3 points (0.2, 0.4 ,0.5) is 104.24
19
Acc
ura
cy
Distance between histograms of two images
cerc.iiitd.ac.in
DAISY features
20
Similar images
Dissimilar images
Dis
tan
ce
Dis
tan
ce
Charlie Hebdo: 570; Similar: 114; Dissimilar: 456
cerc.iiitd.ac.in
Histogram presentation
21
cerc.iiitd.ac.in
ORB
• At a distance 29 the accuracy in all events > 85% • Avg variance for 3 points(29,32,35) is 17.6
22
Acc
ura
cy
cerc.iiitd.ac.in
ORB + RANSACA
ccu
racy
True match ratio by RANSAC
23
• Avg variance at 3 points (0.33, 0.35, 0.37) is ~ 6.2
cerc.iiitd.ac.in
Comparing FPR and TPR
Event ORB Improved ORB
BabaRamRahim TPR: 0.6 TPR: 0.7
Kulkarni Ink TPR: 0.62 TPR: 0.78
ShaniShingnapur TPR: 0.97 TPR: 1.0
KejriwalInsultsHanuman TPR: 0.90 TPR: 1.0
24
AIM: To minimise FPR and achieve best TPR.
Compare TPR of ORB and Improved ORB at FPR=0
cerc.iiitd.ac.in
Comparing accuracy with different input
images
25
cerc.iiitd.ac.in
Images stitched and scaled
26
Scaling factor for the above images are: Cyan :(9.6*3.87), Magenta: (2.08*1.25), Yellow: (1.9*1.2), Red: (2.1*1.23), Green:(1.7*1.3), Blue:(2.7*2.3)
Acc
ura
cy
True match ratio
cerc.iiitd.ac.in
Image with text and scaled
Scaling Factor of the above images areRed : (1.66*1.7)Green : (7.45*5.25)Cyan: (1.2*1.08)Blue: original: size (600*815)
27
Acc
ura
cy
True match ratio
cerc.iiitd.ac.in
Cropped, text, stitched images
28
True match ratio
Acc
ura
cy
cerc.iiitd.ac.in
Images having less colour
29
Scaling Factor:Green: (2.02* 1.54), Red: (1.04*1.28)Cyan: (3.9*2.4) ,Magenta: (2.0*1.13),Blue: Not scaled, Black: Original
Acc
ura
cy
True match ratio
cerc.iiitd.ac.in 30
Comparing proposed system with Google reverse image search
Kanhaiya Kumar
Proposed System Google Image Search
Total images: 892Most Similar: 42Moderately Similar: 36 Least Similar: 814
Total images: 48
cerc.iiitd.ac.in
Proposed System Google Reverse Image
31
Screen Shot
cerc.iiitd.ac.in 32
AIM
Image Retrieval
System
Predicting the Spread
cerc.iiitd.ac.in
Data collection
Event Total tweets
Unique Tweets
kulkarni 1912 404
BabaRamRahim 420 117
KejriwalInsultsHanuman 1400 665
CharlieHebdo 1079 312
ShaniShingnapur 1230 183
RohithVemulla 3104 359
33
cerc.iiitd.ac.in
Features analysed
Image Features Tweet Features User features
Mean Red Tweet length Status Count
Mean Green Sentiment Followers Count
Mean Blue Hashtag Count Friends Count
Presence of face Media Count Follower_Followee Ratio
Mention Count Verified
Tweet Age Favourites Count
Account Age
34
• Ethem F Can, Hu¨seyin Oktay, and R Manmatha. Predicting retweet count using visual cues. International Conference on information & knowledge management, pages 1481–1484. ACM, 2013.• Bob van de Velde, Albert Meijer, and Vincent Homburg. Police message diffusion on twitter: analysing the reach of social media communications. Behaviour & Information Technology, 2015.
cerc.iiitd.ac.in
Spearman correlation:Features & Retweet count
Features Correlation
Media count * -0.683
Mention count * 0.603
Favourites count * (log) 0.433
Sentiment * 0.367
Tweet length * 0.363
Verified (binary) * 0.311
Hashtag count * 0.267
Tweet age * -0.259
Account age * 0.109
Friends count * (log) 0.077
Face presence (binary) * -0.066
Status count * (log) -0.059
Follower-Followee ratio * -0.053
Follower count * 0.052
35
Ethem F Can, Hu¨seyin Oktay, and R Manmatha. Predicting retweet count using visual cues. International Conference on information & knowledge management, pages 1481–1484. ACM, 2013.
cerc.iiitd.ac.in
Regression results
Model(10 Fold cross validation)
RMSE MAE
Linear Regression 1.67 1.37
SVR (C= 8, gamma =2)
2.41 2.16
Random Forest (#trees =60)
1.10 0.72
36
• Mean retweet count: 2.731• Data count: 2040
cerc.iiitd.ac.in
Conclusions
Improved ORB gives the best results, with accuracy above 90%
Reduce in accuracy is seen if the input image is highly cropped or scaled (factor > ~3.5), or modification is done on images with more colors
Lower level image features like mean red, green, blue do not give good correlation values with retweetcount
Out of the three models Random Forest gives the best results
37
cerc.iiitd.ac.in
Acknowledgements
Department of Electronics and Information Technology (Deity), for funding the work
Niharika Sachdeva, PhD, IIIT-Delhi
Committee Members
Precog & CERC members, family and friends
38
cerc.iiitd.ac.in
Bibliography
Ethem F Can, Hu¨seyin Oktay, and R Manmatha. Predicting retweetcount using visual cues. In Proceedings of the 22nd ACM international Conference on information & knowledge management, pages 1481–1484. ACM, 2013.
Maximilian Jenders, Gjergji Kasneci, and Felix Naumann. Analyzing and predicting viral tweets. In Proceedings of the 22nd international conference on World Wide Web companion, pages 657–664. International World Wide Web Conferences Steering Committee, 2013.
Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. Orb: an efficient alternative to sift or surf. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 2564–2571. IEEE, 2011.
Bob van de Velde, Albert Meijer, and Vincent Homburg. Police message diffusion on twitter: analysing the reach of social media communications. Behaviour & Information Technology, 2015
39
cerc.iiitd.ac.in
Bibliography (I)
Lei Yu, Zhixin Yu, and Yan Gong. An improved orb algorithm of extracting and matching. 2015.
Hacker Factor. The hacker factor. http://www.hackerfactor.com/blog/index.php? /archives/529-Kind-of-Like-That.html, 2013.
Adrian Rosebrock. How-to: Python compare two images. http://www.pyimagesearch. com/2014/09/15/python-compare-two-images/, 2014.
Jain, Anil K., and Aditya Vailaya. "Image retrieval using color and shape."Pattern recognition 29.8 (1996).
40