multimedia analytics: synergy between human and machine by...
TRANSCRIPT
MULTIMEDIA ANALYTICS: SYNERGYBETWEEN HUMAN AND MACHINE BY
VISUALIZATION
Marcel Worring, Jan Zahalka, Stevan Rudinac
Intelligent Systems Lab Amsterdam Amsterdam Data ScienceUniversity of AmsterdamAmsterdam Data Science
INTRODUCTION
I Multimedia data increasingly important
I Valuable sources of knowledge, for example:
I Forensics: analyze multimedia data for evidence of ISISinvolvement
I Travel industry: analyze social media data to map trendingplaces of interest. . .
INTRODUCTION
I Multimedia data increasingly importantI Valuable sources of knowledge, for example:
I Forensics: analyze multimedia data for evidence of ISISinvolvement
I Travel industry: analyze social media data to map trendingplaces of interest. . .
MULTIMEDIA AS A KNOWLEDGE SOURCE
I Night Watch by Rembrandt. How to describe it?g
MULTIMEDIA AS A KNOWLEDGE SOURCE
I Art? Painting? People? Military unit? Amsterdam?g
MULTIMEDIA AS A KNOWLEDGE SOURCE
I Art? Painting? People? Military unit? Amsterdam? . . .Content, technical parameters, geo location, . . .
MULTIMEDIA AS A KNOWLEDGE SOURCE
I Description depends on context provided by the analystAnalyst needs to interact with the system
MULTIMEDIA AS A KNOWLEDGE SOURCE
Image
Tags
Comments
Metadata. . .
I Multimedia items contain multiple types of dataIntegrating them improves the information gain
MULTIMEDIA AS A KNOWLEDGE SOURCE
I What if we have millions of images, tags, metadata. . . ?Intelligent navigation capabilities required from the system
MULTIMEDIA ANALYTICS
I How do we move towards interactive, intelligent, andintegrated multimedia systems?
I Possible answer: multimedia analytics
MultimediaAnalysis
MultimediaAnalytics
InfoVis Visual Analytics
MULTIMEDIA ANALYTICS
I How do we move towards interactive, intelligent, andintegrated multimedia systems?
I Possible answer: multimedia analytics
MultimediaAnalysis
MultimediaAnalytics
InfoVis Visual Analytics
RELATED WORK
I Extensive survey work involving ∼ 800 references
I Covered relevant work from last 10 years:
I Multimedia analyticsI Multimedia visualizationI Information visualizationI Visual analyticsI Automated multimedia analysis
I Multimedia Analytics Article Library (MAAL):
I staff.fnwi.uva.nl/j.zahalka/maal.htmlI 374 catalogued references
RELATED WORK
I Extensive survey work involving ∼ 800 referencesI Covered relevant work from last 10 years:
I Multimedia analyticsI Multimedia visualizationI Information visualizationI Visual analyticsI Automated multimedia analysis
I Multimedia Analytics Article Library (MAAL):
I staff.fnwi.uva.nl/j.zahalka/maal.htmlI 374 catalogued references
RELATED WORK
I Extensive survey work involving ∼ 800 referencesI Covered relevant work from last 10 years:
I Multimedia analyticsI Multimedia visualizationI Information visualizationI Visual analyticsI Automated multimedia analysis
I Multimedia Analytics Article Library (MAAL):I staff.fnwi.uva.nl/j.zahalka/maal.htmlI 374 catalogued references
PIPELINE
interactive
model update
navigation
directions
Visualization
Model
Knowledge
. . .
Category 1people
61 items. . .
Category 2nature
93 items. . .
DataMM collection
Images
Annotations
Metadata
I Multimedia instantiation of the visual analytics process (Keim et al., Visualanalytics: Scope and challenges, 2008)
TASK MODEL
Exploration
Search
Start
End
Categorization
I Exploration: uncovering the overall structureI Search: finding particular items
I Exploration-search axis: E-S ratio changes dynamicallyI Mental model attributes: semantic→ categorical
TASK MODEL
Exploration
Search
Start
End
Categorization
I Exploration: uncovering the overall structureI Search: finding particular itemsI Exploration-search axis: E-S ratio changes dynamically
I Mental model attributes: semantic→ categorical
TASK MODEL
Exploration
Search
Start
End
Categorization
I Exploration: uncovering the overall structureI Search: finding particular itemsI Exploration-search axis: E-S ratio changes dynamically
I Mental model attributes: semantic→ categorical
TASK MODEL
Exploration
Search
Start
End
Categorization
I Exploration: uncovering the overall structureI Search: finding particular itemsI Exploration-search axis: E-S ratio changes dynamically
I Mental model attributes: semantic→ categorical
TASK MODEL
Exploration
Search
Start
End
Categorization
I Exploration: uncovering the overall structureI Search: finding particular itemsI Exploration-search axis: E-S ratio changes dynamicallyI Mental model attributes: semantic→ categorical
TASK MODEL
Exploration
Search
Start
End
Categorization
I Exploration: uncovering the overall structureI Search: finding particular itemsI Exploration-search axis: E-S ratio changes dynamicallyI Mental model attributes: semantic→ categorical
CATEGORIZATION
I Categorization — assigning individual multimedia itemsinto categories defined by the analyst
CHALLENGE: THE GAPS
Complex and abstract semantics
Recognized instantly
Put in context
Limited semantics
Takes time, computationally costly
No context
semantic gap
New categories on the fly
Non-exclusive categories
Dynamic category semantics
Static no. of classes
Exclusive classes
Static class semantics
pragmatic gap
I Multimedia analysis capabilities very different for humansand machines
I Semantic gap [Smeulders et al. 2000] — richness ofsemantics
I Pragmatic gap (our work) — flexibility of the model
CHALLENGE: THE GAPS
Complex and abstract semantics
Recognized instantly
Put in context
Limited semantics
Takes time, computationally costly
No context
semantic gap
New categories on the fly
Non-exclusive categories
Dynamic category semantics
Static no. of classes
Exclusive classes
Static class semantics
pragmatic gap
I Multimedia analysis capabilities very different for humansand machines
I Semantic gap [Smeulders et al. 2000] — richness ofsemantics
I Pragmatic gap (our work) — flexibility of the model
CHALLENGE: THE GAPS
Complex and abstract semantics
Recognized instantly
Put in context
Limited semantics
Takes time, computationally costly
No context
semantic gap
New categories on the fly
Non-exclusive categories
Dynamic category semantics
Static no. of classes
Exclusive classes
Static class semantics
pragmatic gap
I Multimedia analysis capabilities very different for humansand machines
I Semantic gap [Smeulders et al. 2000] — richness ofsemantics
I Pragmatic gap (our work) — flexibility of the model
SIMILARITY BROWSER
FORK BROWSER
PHOTO CUBE
MULTIMEDIA PIVOT TABLES
STATE OF THE ART
Limited Intermediate Advanced
Limited
Inter-mediate
AdvancedGoal
I-SI NewdlesVisitInformedia
Canopy
Similaritybrowser
INA browser
MediaTable
semantic gap
pragmatic gap
I Systems advance w.r.t. gapsI Algorithms and techniques allow realization of our model
STATE OF THE ART
Limited Intermediate Advanced
Limited
Inter-mediate
AdvancedGoal
I-SI NewdlesVisitInformedia
Canopy
Similaritybrowser
INA browser
MediaTable
semantic gap
pragmatic gap
I Systems advance w.r.t. gaps
I Algorithms and techniques allow realization of our model
STATE OF THE ART
Limited Intermediate Advanced
Limited
Inter-mediate
AdvancedGoal
I-SI NewdlesVisitInformedia
Canopy
Similaritybrowser
INA browser
MediaTable
semantic gap
pragmatic gap
I Systems advance w.r.t. gapsI Algorithms and techniques allow realization of our model
INSTANTIATING THE MODEL
NEW YORKER MELANGE
I Interactive New York venue recommender
I “Explore the city through the eyes of social media usersthat share interests with you.”
I newyorkermelange.com
I ACM Multimedia Grand Challenge 2014 1st Prize
NEW YORKER MELANGE
I Interactive New York venue recommenderI “Explore the city through the eyes of social media users
that share interests with you.”
I newyorkermelange.com
I ACM Multimedia Grand Challenge 2014 1st Prize
NEW YORKER MELANGE
I Interactive New York venue recommenderI “Explore the city through the eyes of social media users
that share interests with you.”I newyorkermelange.com
I ACM Multimedia Grand Challenge 2014 1st Prize
NEW YORKER MELANGE
I Interactive New York venue recommenderI “Explore the city through the eyes of social media users
that share interests with you.”I newyorkermelange.com
I ACM Multimedia Grand Challenge 2014 1st Prize
NEW YORKER MELANGE: INGREDIENTS
Visual & textfeatures for
venues & users
Grid, map
SVM
Interesting venuesto visit
indicate
relevant
users & venues
suggest
more
relevant
users & venues
Exploration SearchNY Melange
NEW YORKER MELANGE: INGREDIENTS
Visual & textfeatures for
venues & users
Grid, map
SVM
Interesting venuesto visit
indicate
relevant
users & venues
suggest
more
relevant
users & venues
Exploration SearchNY Melange
NEW YORKER MELANGE
NEW YORKER MELANGE
DATASET
New York venuesVenue images
Images, metadata
Q(venue name,geo)
DATASET
New York venuesVenue images
Images, metadata
Q(venue name,geo)
DATASET
I >1M New York venue images with metadata
I Real dataset with a purposeI Query strategy designed to reduce noise
I Exploitable size-noise tradeoff
I Each image has a venue category label→ ready forclassification
DATASET
I >1M New York venue images with metadataI Real dataset with a purpose
I Query strategy designed to reduce noise
I Exploitable size-noise tradeoff
I Each image has a venue category label→ ready forclassification
DATASET
I >1M New York venue images with metadataI Real dataset with a purposeI Query strategy designed to reduce noise
I Exploitable size-noise tradeoff
I Each image has a venue category label→ ready forclassification
DATASET
I >1M New York venue images with metadataI Real dataset with a purposeI Query strategy designed to reduce noise
I Exploitable size-noise tradeoffI Each image has a venue category label→ ready for
classification
VENUE/USER TOPICS
Dataset
Images
Annotations
Foursquare
Flickr
Picasa
Features
1000
visual
concepts
100
latent
topics
ConvNet
LDA
Clustering
Venuetopics
Visual
Text
Usertopics
Visual
Text
VENUE/USER TOPICS
Dataset
Images
Annotations
Foursquare
Flickr
Picasa
Features
1000
visual
concepts
100
latent
topics
ConvNet
LDA
Clustering
Venuetopics
Visual
Text
Usertopics
Visual
Text
VENUE/USER TOPICS
Dataset
Images
Annotations
Foursquare
Flickr
Picasa
Features
1000
visual
concepts
100
latent
topics
ConvNet
LDA
Clustering
Venuetopics
Visual
Text
Usertopics
Visual
Text
USER PREFERENCE LEARNING
Initial
interface
Negatives
Positives
empty
+relevant
venues
User
topics
(random sample)
Linear
SVM
User
ranking
Venue
selection
Venue
topics
Map
interface
+relevant
users
+non-relevant
users
USER PREFERENCE LEARNING
Initial
interface
Negatives
Positives
empty
+relevant
venues
User
topics
(random sample)
Linear
SVM
User
ranking
Venue
selection
Venue
topics
Map
interface
+relevant
users
+non-relevant
users
USER PREFERENCE LEARNING
Initial
interface
Negatives
Positives
empty
+relevant
venues
User
topics
(random sample)
Linear
SVM
User
ranking
Venue
selection
Venue
topics
Map
interface
+relevant
users
+non-relevant
users
USER PREFERENCE LEARNING
Initial
interface
Negatives
Positives
empty
+relevant
venues
User
topics
(random sample)
Linear
SVM
User
ranking
Venue
selection
Venue
topics
Map
interface
+relevant
users
+non-relevant
users
USER PREFERENCE LEARNING
Initial
interface
Negatives
Positives
empty
+relevant
venues
User
topics
(random sample)
Linear
SVM
User
ranking
Venue
selection
Venue
topics
Map
interface
+relevant
users
+non-relevant
users
USER PREFERENCE LEARNING
Initial
interface
Negatives
Positives
empty
+relevant
venues
User
topics
(random sample)
Linear
SVM
User
ranking
Venue
selection
Venue
topics
Map
interface
+relevant
users
+non-relevant
users
USER PREFERENCE LEARNING
Initial
interface
Negatives
Positives
empty
+relevant
venues
User
topics
(random sample)
Linear
SVM
User
ranking
Venue
selection
Venue
topics
Map
interface
+relevant
users
+non-relevant
users
USER PREFERENCE LEARNING
Initial
interface
Negatives
Positives
empty
+relevant
venues
User
topics
(random sample)
Linear
SVM
User
ranking
Venue
selection
Venue
topics
Map
interface
+relevant
users
+non-relevant
users
USER PREFERENCE LEARNING
Initial
interface
Negatives
Positives
empty
+relevant
venues
User
topics
(random sample)
Linear
SVM
User
ranking
Venue
selection
Venue
topics
Map
interface
+relevant
users
+non-relevant
users
EVALUATION: SCHEME
I Real user data
I 25% of the visited venues withheld, rest used to seed thesystem
I 10 interaction roundsI Measure: average recall of the withheld venuesI Only exact withheld venues count as match
EVALUATION: SCHEME
I Real user dataI 25% of the visited venues withheld, rest used to seed the
system
I 10 interaction roundsI Measure: average recall of the withheld venuesI Only exact withheld venues count as match
EVALUATION: SCHEME
I Real user dataI 25% of the visited venues withheld, rest used to seed the
systemI 10 interaction rounds
I Measure: average recall of the withheld venuesI Only exact withheld venues count as match
EVALUATION: SCHEME
I Real user dataI 25% of the visited venues withheld, rest used to seed the
systemI 10 interaction roundsI Measure: average recall of the withheld venues
I Only exact withheld venues count as match
EVALUATION: SCHEME
I Real user dataI 25% of the visited venues withheld, rest used to seed the
systemI 10 interaction roundsI Measure: average recall of the withheld venuesI Only exact withheld venues count as match
EVALUATION: RESULTS
1 2 3 4 5 6 7 8 9 100
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Interaction Round
AverageRecall
Baseline
NYM-VNYM-T
NYM-VT
FUTURE OF MELANGE: SOFTWARE
I Consolidated Melange deployable everywhere
I AmsterdamI Hong KongI BeijingI Washington, D. C.I PragueI Rennes
. . .
FUTURE OF MELANGE: SOFTWARE
I Consolidated Melange deployable everywhereI Amsterdam
I Hong KongI BeijingI Washington, D. C.I PragueI Rennes
. . .
FUTURE OF MELANGE: SOFTWARE
I Consolidated Melange deployable everywhereI AmsterdamI Hong Kong
I BeijingI Washington, D. C.I PragueI Rennes
. . .
FUTURE OF MELANGE: SOFTWARE
I Consolidated Melange deployable everywhereI AmsterdamI Hong KongI Beijing
I Washington, D. C.I PragueI Rennes
. . .
FUTURE OF MELANGE: SOFTWARE
I Consolidated Melange deployable everywhereI AmsterdamI Hong KongI BeijingI Washington, D. C.
I PragueI Rennes
. . .
FUTURE OF MELANGE: SOFTWARE
I Consolidated Melange deployable everywhereI AmsterdamI Hong KongI BeijingI Washington, D. C.I Prague
I Rennes. . .
FUTURE OF MELANGE: SOFTWARE
I Consolidated Melange deployable everywhereI AmsterdamI Hong KongI BeijingI Washington, D. C.I PragueI Rennes
. . .
FUTURE OF MELANGE: SOFTWARE
I Consolidated Melange deployable everywhereI AmsterdamI Hong KongI BeijingI Washington, D. C.I PragueI Rennes
. . .
CONCLUSION
I A model of multimedia analytics integration, tasks andchallenges
I Based on extensive survey work
I Multimedia Analytics Article Library:staff.fnwi.uva.nl/j.zahalka/maal.html
I Current state-of-the-art techniques allow realization
I Ample research opportunities in closing the gaps
I Model already successfuly instantiated
I New Yorker Melange: newyorkermelange.com
ImagesText
Metadata
CONCLUSION
I A model of multimedia analytics integration, tasks andchallenges
I Based on extensive survey workI Multimedia Analytics Article Library:staff.fnwi.uva.nl/j.zahalka/maal.html
I Current state-of-the-art techniques allow realization
I Ample research opportunities in closing the gaps
I Model already successfuly instantiated
I New Yorker Melange: newyorkermelange.com
ImagesText
Metadata
CONCLUSION
I A model of multimedia analytics integration, tasks andchallenges
I Based on extensive survey workI Multimedia Analytics Article Library:staff.fnwi.uva.nl/j.zahalka/maal.html
I Current state-of-the-art techniques allow realizationI Ample research opportunities in closing the gaps
I Model already successfuly instantiated
I New Yorker Melange: newyorkermelange.com
ImagesText
Metadata
CONCLUSION
I A model of multimedia analytics integration, tasks andchallenges
I Based on extensive survey workI Multimedia Analytics Article Library:staff.fnwi.uva.nl/j.zahalka/maal.html
I Current state-of-the-art techniques allow realizationI Ample research opportunities in closing the gaps
I Model already successfuly instantiatedI New Yorker Melange: newyorkermelange.com
ImagesText
Metadata