discovering recurrent events in multi-channel data streams using unsupervised methods
DESCRIPTION
Discovering Recurrent Events in Multi-channel Data Streams using Unsupervised Methods. Organization. Mining in Multimodal Data Streams Detecting Structure/Recurring Events Ergodic+Non-ergodic HMMs Experiments with Different Domains Concluding Remarks. Multimedia Semantics. - PowerPoint PPT PresentationTRANSCRIPT
Naphade Li & Huang, NGDM 02 1
Discovering Recurrent Events in Multi-channel Data Streams using Unsupervised Methods
Milind R. Naphade, Chung-Sheng Li
Pervasive Media Management Group
IBM T. J. Watson Research Center
Hawthorne, NY
Contact: [email protected]
Thomas S. HuangImage Formation & Processing
Group
Beckman Institute for Advanced Science & Technology,
University of Illinois at Urbana-Champaign, Urbana, IL
Contact: [email protected]
Naphade Li & Huang, NGDM 02 2
Organization
Mining in Multimodal Data Streams Detecting Structure/Recurring Events
Ergodic+Non-ergodic HMMs Experiments with Different Domains Concluding Remarks
Naphade Li & Huang, NGDM 02 3
Multimedia Semantics
The Semantics of Contents Objects, Sites and Events of Interest in the Video
(ICIP 02)
The Semantics of Context The Semantics of Structure/Recurrence
Scenes Context Changes Recurring Temporal Patterns Structural Syntax
Naphade Li & Huang, NGDM 02 4
State of the ART
• Content Analysis:• Image/Video Classification: Naphade (UIUC), Vailaya (Michigan
State), Iyengar & Vasconcelos (MIT), Smith (IBM) • Semantic Audiovisual Analysis: Naphade (UIUC), Chang (Columbia).
• Learning and Multimedia:• Statistical Media Learning: Naphade (UIUC), Forsyth (Berkeley),
Fisher & Jebara (MIT), V. Iyengar (IBM).• Learning in Image Retrieval: Chang et al. (UCSB), Zhang et al
(Microsoft Research), Naphade et al. (UIUC) Viola et al. (MIT, MERL). • Linking Clusters in Media Feature: Barnard & Forsyth (Berkeley),
Slaney (IBM).• Vision and Speech:
• Computer Vision in Media Analysis: Bolle (IBM), Mallik (Berkeley)• Auditory Scene Analysis & Discriminant ASR Models: Ellis
(MIT), Nadas et al. (IBM), Gopalkrishnan et al (IBM), Woodland et al. (Cambridge), Naphade et al (UIUC) Wang et al (NYU), Kuo et al. (USC)
Naphade Li & Huang, NGDM 02 5
Media Learning: A Perspective
Semantics
Query by Example,
Relevance Feedback,
Unsupervised Segmentation
Boosting
SVM, NN, GMM, HMM-based classification, Multijects, Multinet,
Supervised Segmentation, ASR, CASA
Future of Multimodal
Mining
• More Supervision More Semantics • Semi-Autonomous Learning Clever techniques for
supervision that reduce amount of user input
Sup
ervi
sion
Naphade Li & Huang, NGDM 02 6
Extracting Semantics: What Options?
Manual Semi-automatic
Most accurateMost time consuming
ExpensiveStatic
PossibleAdaptive
Challenging
Future: For this to be possible and useful needAutonomous Learning.
Fully Automated
Challenge: In this realm use “intelligence” and “learning” to move from left to right without compromising on performance.
Signals Features Semantics
Past Today Goal
Autonomous and User Friendly
Naphade Li & Huang, NGDM 02 7
Challenges of Multimedia Learning
Problem ApproachTremendous variability and uncertainty Framework must take uncertainty into
account
Small number of training examples (relative to feature dimensionality)
Exhaustive training techniques such as those for ASR not possible
Complex distributions, highly non-linear decision boundaries, high-dimensional feature spaces
Employ feature selection and dimensionality reduction. Linear classifiers not sufficient.
Manual annotation is time-consuming expensive, human barrier
Learning needs to be user-centric
Dependence on a host of scientific disciplines for extracting good features, none of which have been perfected
Must get around imperfect segmentation, single-channel auditory non-separability
Multiple Channels with possible relationships that are unknown
Need to fuse information
• Challenging problems not easily addressed by traditional approaches.
Naphade Li & Huang, NGDM 02 8
Media Learning: Proposed Architecture
Retrieval/Summarization
Fusion
Feedback
AudioFeatures
VisualFeatures
Segmentation
Annotation
Multimedia Repository
Audio Models
SpeechModels
VisualModels
Learning modelsfeatures
Active Sample Selection
Knowledge Repository
GranularityResolution
Active Learning
Multiple Instance Learning
Graphical Models for Decision Fusion
Discovering Structures and Recurring Patterns
SVM, GMM, HMM
Naphade Li & Huang, NGDM 02 9
Detecting the Semantics of Structure
Examples News: The Anchor Person Sports e.g. Baseball: Homerun, Pitch, Strike-Out Talk-shows: Monologue, Laughter, Applause, Music Movies: e.g. Action Movies: Explosions, Gunshots.
Challenges Mapping features to semantics. Evaluating a finite set of predefined hypotheses. Granularity: Structure exists at different granularities. Multimodal Fusion.
Naphade Li & Huang, NGDM 02 10
Related Literature
Early Use of HMMs for capturing stationarity and transition and its application to clustering: A. B. Poritz, Levenson et al.
Scene Segmentation (using HMMs): Wolf, Ferman & Tekalp; Kender & Yeo; Liu, Huang & Wang; Sundaram and Chang, Divakaran & Chang.
Multimodal scene similarity: Nakamura & Kanade; Nam Cetin & Tewfik; Naphade, Wang & Huang; Srinivasan, Ponceleon; Amir and Petkovic; Adams et al.
Naphade Li & Huang, NGDM 02 11
Ergodic HMMs
Poritz showed how an ergodic model could capture repetitive patterns in the speech signals through unsupervised clustering.
1 2
3
A Possible State Sequence:
1
Naphade Li & Huang, NGDM 02 12
Ergodic HMMs
1 2
3
A Possible State Sequence:
1 1
Poritz showed how an ergodic model could capture repetitive patterns in the speech signals through unsupervised clustering.
Naphade Li & Huang, NGDM 02 13
Ergodic HMMs
Poritz showed how an ergodic model could capture repetitive patterns in the speech signals through unsupervised clustering.
1 2
3
A Possible State Sequence:
1 1 2
Naphade Li & Huang, NGDM 02 14
Ergodic HMMs
Poritz showed how an ergodic model could capture repetitive patterns in the speech signals through unsupervised clustering.
1 2
3
A Possible State Sequence:
1 1 2 3
Naphade Li & Huang, NGDM 02 15
Ergodic HMMs
Poritz showed how an ergodic model could capture repetitive patterns in the speech signals through unsupervised clustering.
1 2
3
A Possible State Sequence:
1 11 2 3
Naphade Li & Huang, NGDM 02 16
Ergodic HMMs
Poritz showed how an ergodic model could capture repetitive patterns in the speech signals through unsupervised clustering.
1 2
3
A Possible State Sequence:
1 11 2 3
Naphade Li & Huang, NGDM 02 17
Non Ergodic HMMs
Transition from any state to any other state not permitted as in the Ergodic Case
1 2 3
A Possible State Sequence:
1
Naphade Li & Huang, NGDM 02 18
Non Ergodic HMMs
Transition from any state to any other state not permitted as in the Ergodic Case
1 2 3
A Possible State Sequence:
1 1
Naphade Li & Huang, NGDM 02 19
Non Ergodic HMMs
Transition from any state to any other state not permitted as in the Ergodic Case
1 2 3
A Possible State Sequence:
1 1 2
Naphade Li & Huang, NGDM 02 20
Non Ergodic HMMs
Transition from any state to any other state not permitted as in the Ergodic Case
1 2 3
A Possible State Sequence:
1 1 2 3
Naphade Li & Huang, NGDM 02 21
Capturing Short Term Stationarity and Long-Term Structure
1 2 3
1 2 3
1 2 3
DD
• Each branch: non-ergodic
• All branches embedded in a hierarchical ergodic structure
Naphade Li & Huang, NGDM 02 22
Capturing Short Term Stationarity and Long-Term Structure
1 2 3
D D1 2 3
1 2 3
• Each branch: non-ergodic
• All branches embedded in a hierarchical ergodic structure
Naphade Li & Huang, NGDM 02 23
Experimental Setup
DomainsAction Videos (20 clips from “Specialist”)Late Night Shows (20 min of Dave Letterman)
FeaturesVisual (30 frames/sec)
Color (HSV histogram)Structure (Edge Direction histogram)
Audio (30 audio frames/sec to sync with video)32 Mel Frequency Cepstral Coefficients (10 ms
overlap)
Naphade Li & Huang, NGDM 02 24
Results: Recurring Patterns in Video
Movie: Specialist
Discovered Recurring Pattern: Explosion
S1_d10.asx
Naphade Li & Huang, NGDM 02 25
Results: Recurring Patterns in Video
Late night Show with Dave Letterman
Discovered Patterns: Applause, Laughter, Speech, Music
latenight_s1_d100.asx
latenight_s16_d15.asx
LaughterApplause
Naphade Li & Huang, NGDM 02 26
Observations
Completely UNSUPERVISED In case of recurring temporal event patterns this scheme is
capable of discovering them if there is a sufficient number of these patterns in the set.
In case of repetitive anchoring events such as Applause in Comedy Shows, scheme capable of discovering these events.
Segmentation and Pattern Discovery very helpful in annotation. E.g. to manually annotate Dave Letterman’s jokes, just look before the applause
Anyone who has done manual audio annotation knows how useful it is to get the right segment boundaries especially at the micro and macro level.
Naphade Li & Huang, NGDM 02 27
Summary
Problem: Automatic discovery of recurring temporal patterns
without supervision. Approach:
Clustering: Use of unsupervised temporal clustering using a hierarchical ergodic model with non-ergodic temporal pattern models
Interaction: User then needs to analyze only the extracted recurring set to quickly propagate annotation.
Results: Automatic extraction of recurring patterns (laughter, explosion,
monologue etc.) and regular structure Near-complete elimination of manual annotation. Orders of magnitude
reduction in annotation of clusters than annotation of content.
Naphade Li & Huang, NGDM 02 28
Future Directions
Experiment with different non-ergodic branches as well as across branch transitions
Use this to bootstrap training of semantic events that can be detected using HMMs/DBNs (ICIP 98, NIPS 2000).
Explore visual features extracted regionally to model richer class of recurring patterns.
Experimenting with the Sports Domain (possible interaction with Prof. Chang and his group)