trondheim bigdata talk
DESCRIPTION
Talk at Trondheim Big Data meetup - "Geotemporal Social Data and Events in Multimedia"TRANSCRIPT
Geo-Temporal-(Social?) Data?!Social Events in Social Media!
!
Massimiliano Ruocco!
!
@ruoccoma!ruoccoma dot gmail dot com!Telenor Digital (SWEng), NTNU (PhD)!
Who am I?
Digital Footprint
Social
Geographical
Temporal
Scenario 1 User visiting a touristic spot. Takes a picture of it. Posts it (+ comments) on FB/Flickr/Twitter.
Scenario 2 User watching a football match at the stadium. Takes a picture of the match (+ comments). Posts it on FB/Flickr/Twitter
Scenario 3 User reading newspaper. Comments some trending facts (i.e.: crisis in Middle East). Posts it Twitter.
Scenario 1 User visiting a touristic spot. Takes a picture of it. Posts it (+ comments) on FB/Flickr/Twitter.
Scenario 2 User watching a football match at the stadium. Takes a picture of the match (+ comments). Posts it on FB/Flickr/Twitter
Scenario 3 User reading newspaper. Comments some trending facts Comments some trending facts (i.e.: crisis in Middle East). Posts it Twitter.
Event! <<my trip in Naples>>
Event! <<semifinal CL>>
Event! <<crisis in middle east>>
Events in Social Media From raw data to events
Flickr as data source +250M geotagged
3.5M uploaded/day 87M users
6.000M pics
POI-related Tag Extraction
POI-related Tag Extraction Tag Point Pattern Geo distribution of pictures tagged with a certain term
Point Process Theory Extended rigorous statistic
POI-related Tag Extraction
Point Pattern Analysis Objective Determine If a given set of spa1al points (Spa1al Point Pa6ern) exhibits clustering, regularity or are randomly distributed within an area A
POI-related Tag Extraction Ripley’s K-function Summarizing a spa1al point pa6ern over a scale h
CSR Test -‐ K(h) >πh2 clustering at scale h -‐ K(h) <πh2 dispersion at scale h
POI-related Tag Extraction Ripley’s K-function Summarizing a spa1al point pa6ern over a scale h
CSR Test -‐ D(h) >h clustering at scale h -‐ D(h) <h dispersion at scale h
POI-related Tag Extraction Ripley’s Cross-K-function Summarizing a spa1al correla1on over two tag point pa6ern over a scale h
Spa1al distribu1on of the Tag Point Pa6erns related to the tag Old Naval College and the tag University of Greenwich at two different zooming
POI-related Tag Extraction Ripley’s Cross-K-function Summarizing a spa1al correla1on over two tag point pa6ern over a scale h
Spa1al distribu1on of the Tag Point Pa6erns related to the tag Old Naval College and the tag University of Greenwich at two different zooming
POI-related Tag Extraction Ripley’s Cross-K-function Summarizing a spa1al correla1on over two tag point pa6ern over a scale h CSR Test
-‐ L12(h) >0 a6rac1on at scale h -‐ L12(h) <0 repulsion at scale h
���� ���� ���� ���� �����
���
���
����
���������� �� �����
����������������������
Spa1al distribu1on of the Tag Point Pa6erns related to the tag Old Naval College and the tag University of Greenwich !
POI-related Tag Extraction Ripley’s Cross-K-function Summarizing a spa1al correla1on over two tag point pa6ern over a scale h CSR Test
-‐ K12(h) >πh2 a6rac1on at scale h -‐ K12(h) <πh2 repulsion at scale h
���� ���� ���� ���� �����
���
���
����
���������� �� �����
����������������������
Spa1al distribu1on of the Tag Point Pa6erns related to the tag Old Naval College and the tag University of Greenwich !
POI-related Tag Extraction
Objective
Derive indicators es1ma1ng clustering tendency of Tag-‐point pa6ern
Applications
1 -‐ Extrac2ng/Ranking social tags indica1ng geographical POI 2 -‐Enhance query expansion in combina1on with other metadata
Size
Inhomogeneity
POI-related Tag Extraction Real Data: Challenges
Example of point pattern of the tag night !
Data Inhomogeneity
Data Inhomogeneity
Related underlying Picture Point Pattern !
Data Inhomogeneity
Example of point pattern of the tag night over the underlying distribution!
Size
1 -‐ Subsampling2-‐ bigmatrix* and biganalytics** (R)
Inhomogeneity
Case-‐Control Analysis
POI-related Tag Extraction Real Data: Challenges
*Kane M., Emerson J., “The R Package bigmemory: Suppor2ng Efficient Computa2on and Concurrent Programming with Large Data Sets” (2010). Journal of Sta1s1cal SoVware.
**Kane M. et al., “Scalable Strategies for Compu2ng with Massive Data” (2013), Journal of Sta1sc1cal SoVware.
POI-related Tag Extraction
2 -‐ Maximum func1on value K(h) over the scale
1 -‐ Area underlying K(h) in the considered scale
Derived Geo-Features
Set 1
Set 2
���� ���� ���� ���� �����
���
���
����
����
������ �� �����
����������������������
wwt! grdstreeteatportlan! britishlibrary! astoria!
POI-related Tag Extraction
Table - Top-5 tags extracted ranked by MaxValue and Area
Event-related Image Search Geo(Temporal)-tagged resources supporting IR
Event-related Image Search Geo(Temporal)-tagged resources supporting IR
Event-related Image Search Expansion terms selection over three dimensions
Text Features (baseline) -‐ TF, IDF, DF
Time Features -‐ Kurtosis: Peakdness -‐ Autocorrela-on: Randomness -‐ Cross-‐Correla-on
Geo Features -‐ Good expansion = spa1ally correlated with qi
-‐ Calculated for each 1le Tqi ,e
Derived from q-‐point pa6er & e-‐point pa6ern & (q+e)-‐point pa6ern
Event-related Image Search Scalability?
Event-related Image Search Scalability?
Which Tile? 1 - Best tile + calculate confidence value 2 - Confidence values combination from different tiles:
Map Reduce fashion + Solr Search engine
Event-related Image Search Scalability?
Event-related Image Search Results
Table – Comparison of the classification performances. The best scores in each column are type-set boldface.
Event-related Image Search Results
Fig – Comparison of MAP improvements as function of number of feedback docs
Yes! But…BigData? • Bigmatrix + Biganalytics in R • Subsampling • World map divided in tiles • Map-Reduce fashion algorithm
Cool stuff! Increasing volume of Geo-Temporal Data from Social Media ++
Amazing things! – Visualization – Location-based recommendation – Dicovering trends!
Thanks! Questions?
M. Ruocco and H. Ramampiaro, (2014), "Geo-‐Temporal Distribu2on of Tag Terms for Event-‐Related Image Retrieval". In Informa1on Processing & Management Journal (IPM). Elsevier.
M. Ruocco and H. Ramampiaro, (2014), "A Scalable Algorithm for Extrac2on and Clustering of Event-‐Related Pictures". In Mul1media Tools and Applica1ons Journal (MTAP), Springer.
M. Ruocco and H. Ramampiaro, (2013), "Exploring Temporal Proximity and Spa2al Distribu2on of Terms in Web-‐based Search of Event-‐Related Images". In Proc. of the 24th ACM Conference on Hypertext and Social Media (HT 2013). ACM Press.
M. Ruocco and H. Ramampiaro, (2012), "Exploratory Analysis on Heterogeneous Tag-‐Point PaQerns for Ranking and Extrac2ng Hot-‐Spot Related Tags". Proceedings of the 5th ACM SIGSPATIAL Interna1onal Workshop on Loca1on-‐Based Social Networks (LBSN 2012). ACM Press.
POI-related Tag Extraction
Evaluation of top-100 extracted tags: P@n