bridge semantic gap: a large scale concept ontology for multimedia (lscom)
Post on 19-Jan-2016
23 Views
Preview:
DESCRIPTION
TRANSCRIPT
Bridge Semantic Gap: A Bridge Semantic Gap: A Large Scale Concept Large Scale Concept Ontology for Multimedia Ontology for Multimedia (LSCOM)(LSCOM)
Guo-Jun QiBeckman InstituteUniversity of Illinois at Urbana-Champaign
LSCOM (Large Scale Concept LSCOM (Large Scale Concept Ontology for Multimedia)Ontology for Multimedia)A broadcast news video dataset
200+ news videos/ 170 hours
61,901 shots
Language
◦ English/Arabic/Chinese
Why broadcast News Why broadcast News ontology?ontology?Critical mass of users, content
providers, applicationsGood content availability
(TRECVID LDC FBIS)Share Large set of core concepts
with other domains
LSCOM ProvidesLSCOM ProvidesRichly annotated video content
for accomplishing required access and analysis functions over massive amount of video content
Large scale useful well-defined semantic lexicon◦More than 3000 concepts◦374 annotated concepts◦Bridging semantic gap from low-level
features to high-level concepts
A LSCOM conceptA LSCOM concept
000 - ParadeConcept ID: 000Name: ParadeDefinition: Multiple units of marchers, devices, bands, banners or Music.Labeled: Yes
LSCOM HierarchyLSCOM Hierarchy http://www.lscom.org/ontology/index.html
Thing.Individual..Dangerous_Thing...Dangerous_Situation....Emergency_Incident.....Disaster_Event......Natural_Disaster....Natural_Hazard.....Avalance.....Earthquake.....Mudslide.....Natural_Disaster.....Tornado...Dangerous_Tangible_Thing....Cutting_Device
Definition: What’s the Definition: What’s the ontology? (Wikipedia)ontology? (Wikipedia)An ontology is a formal
representation of the knowledge by a set of concepts within a domain and the relationships between those concepts. It is used to reason about the properties of that domain, and may be used to describe the domain.
OntologyOntologyRepresents the visual knowledge
base in a structure way◦Graph structure◦Tree (hierarchy) structure
Images/videos can be effectively learned and retrieved by the coherence between concepts◦Logical coherence◦Statistical coherence
An Ontology Hierarchy: An Ontology Hierarchy: Military VehicleMilitary Vehicle
An example from An example from WikipediaWikipedia
Ontology Tree for LSCOMOntology Tree for LSCOM
A Light Scale Concept A Light Scale Concept Ontology for Multimedia Ontology for Multimedia Understanding (LSCOM-Lite)Understanding (LSCOM-Lite)The aim is to break the semantic
space using a few concepts (39 concepts).
Selection Criteria◦Semantic Coverage
As many as semantic concepts in News videos could be covered by the light concept set.
◦Compactness These concept should not semantically overlap.
◦Modelability These concepts could be modeled with a
smaller semantic gap.
Selected concept Selected concept dimensionsdimensionsDivide the semantic space into a
multimedia-dimensional space, where each dimension is nearly orthogonal◦Program Category◦Setting/Scene/Site◦People◦Objects◦Activities◦Events◦Graphics
Histogram of LSCOM-Lite Histogram of LSCOM-Lite ConceptsConcepts
Some example keyframesSome example keyframes
ApplicationsApplications
Application I: Conceptual Fusion
(most basic – early fusion)
Application II: Cross-Category
Classification (inter-class relation)
Application III: Event Dynamic in
Concept Space
Application I: Conceptual Application I: Conceptual FusionFusion
Video
Concept 1
Concept 2
Concept 3
Concept n
Visual Features
Classifier
…
LSCOM 374 ModelsLSCOM 374 Models
374 LIBSVM models◦http://www.ee.columbia.edu/ln/dvmm/col
umbia374/◦Feature used (MPEG-7 descriptors)
Color Moments Edge Histogram Wavelet Texture
◦LIBSVM – a library for support vector machine at http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Application II: cross-category Application II: cross-category classification with concept classification with concept transfertransfer
G.-J. Qi et al. Towards Cross-Category Knowledge Propagation for Learning Visual Concepts, in CVPR 2011
Instance-Level Concept Instance-Level Concept CorrelationCorrelation
+1
-1
+1
-1
Mountain Castle
Mountain and castle
Castle o
nly Mountain only
Transfer FunctionTransfer Function
Mountain, Castle
Mountain
Castle
None of them
Model Concept RelationsModel Concept Relations
Automatically construct Automatically construct ontology in a data-driven ontology in a data-driven mannermanner
An application III – Event An application III – Event Dynamics in Concept SpaceDynamics in Concept Space
Event Detection with Event Detection with Concept DynamicsConcept Dynamics
W. Jiang et al, Semantic event detection based on visual concept prediction, ICME, Germany, 2008.
Open ProblemsOpen ProblemsCross-Dataset Gap
◦ Generalize LSCOM dataset to other dataset (e.g., non-news video dataset)
Cross-Domain Gap◦ Text script associated with news videos
Can help information extraction for visual concepts?
Automatic ontology construction◦ Task dependent v.s. task independent◦ Data driven v.s. preliminary knowledge (e.g.,
WordNet)◦ Incorporate prior human knowledge (logic relation
etc.)
TRECVID CompetitionTRECVID CompetitionTask 1: High-Level Feature
Extraction◦Input: subshot◦Output: detection results for 39
LSCOM-Lite concepts in the subshot
High-Level Feature High-Level Feature ExtractionExtractionEach concept assumed to be binary
(absent or present) in each subshotSubmission: Find subshots that
contain a certain concept, rank them by the detection confidence score, and submit the top 2000.
Evaluations: NIST evaluated 20 medium frequent concepts from 39 concepts using a 50% random samples of all the submission pools
20 Evaluated Concepts20 Evaluated Concepts
Evaluation Metric: Average Evaluation Metric: Average PrecisionPrecisionRelevant subshots should be
ranked higher than the irrelevant ones.
R is the number of relevant images in total, Rj is the number of relevant images in top j images, Ij indicates if the jth image is irrelevant or not.
1
1Average Precision
Njj
j
RI
R j
ResultsResults
TRECVID CompetitionTRECVID CompetitionTask II: Video Search
◦Input: text-based 24 topics◦Output: relevant subshots in the
database
Topics to searchTopics to search
Topics to search (cont’d)Topics to search (cont’d)
Topics to searchTopics to search
Three Types of Search Three Types of Search Systems Systems
Results: Automatic RunsResults: Automatic Runs
Results: Manual RunsResults: Manual Runs
Results: Interactive RunsResults: Interactive Runs
Machine Problem 7: Shot Machine Problem 7: Shot Boundary Detection in Boundary Detection in VideosVideos
GoalsGoalsDetect the abrupt content
changes between consecutive frames.◦Scene changes◦Scene cuts
StepsStepsStep 1: Measuring the change of
content between video frames◦Visual/Acoustic measurements
Step 2: Compare the content distance between successive frames. If the distance is larger than a certain threshold, then a shot boundary may exist.
Measuring Content based on Measuring Content based on Visual InformationVisual Information256 dimensional Color Histogram
◦In RGB space, normalize the r, g, b in [0,1]
◦Color spacenr
ng
8X8 histogram
Color HistogramsColor HistogramsDivide each image into four
parts, each part has a 8X8 histogram, and 256 dim features in total.
Acoustic FeaturesAcoustic Features
12 cepstral coefficients
Energy (sum of square of raw signals)
Zero crossing rates (ZCR)
ZCR = sum(|sign(S(2:N))-sign(S(1:N-
1))|)Hints: normalize energy to avoid it
over-dominating when computing distances between successive frames
DatasetsDatasetsTwo videos of little over one
minuteManually label the shot boundary
What to submitWhat to submitSource codeReport
◦compare shot boundary detection results returned by your algorithm with the manually labeled boundaries
◦Compare ◦Explain your choice of threshold◦Explain the differences between the
acoustic-based and visual-based detection results
Where and when to Where and when to submitsubmit
Email to ece.ece.ece.417@gmail.com
Due: May 2nd
Thanks! Thanks! Q&AQ&A
top related