discovering recurrent events in multi-channel data streams using unsupervised methods

28
Naphade Li & Huang, NGD M 02 1 Discovering Recurrent Events in Multi-channel Data Streams using Unsupervised Methods Milind R. Naphade, Chung- Sheng Li Pervasive Media Management Group IBM T. J. Watson Research Center Hawthorne, NY Contact: [email protected] Thomas S. Huang Image Formation & Processing Group Beckman Institute for Advanced Science & Technology, University of Illinois at Urbana-Champaign, Urbana, IL Contact: [email protected] du

Upload: odina

Post on 26-Jan-2016

30 views

Category:

Documents


2 download

DESCRIPTION

Discovering Recurrent Events in Multi-channel Data Streams using Unsupervised Methods. Organization. Mining in Multimodal Data Streams Detecting Structure/Recurring Events Ergodic+Non-ergodic HMMs Experiments with Different Domains Concluding Remarks. Multimedia Semantics. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Discovering Recurrent Events in Multi-channel Data Streams  using Unsupervised Methods

Naphade Li & Huang, NGDM 02 1

Discovering Recurrent Events in Multi-channel Data Streams using Unsupervised Methods

Milind R. Naphade, Chung-Sheng Li

Pervasive Media Management Group

IBM T. J. Watson Research Center

Hawthorne, NY

Contact: [email protected]

Thomas S. HuangImage Formation & Processing

Group

Beckman Institute for Advanced Science & Technology,

University of Illinois at Urbana-Champaign, Urbana, IL

Contact: [email protected]

Page 2: Discovering Recurrent Events in Multi-channel Data Streams  using Unsupervised Methods

Naphade Li & Huang, NGDM 02 2

Organization

Mining in Multimodal Data Streams Detecting Structure/Recurring Events

Ergodic+Non-ergodic HMMs Experiments with Different Domains Concluding Remarks

Page 3: Discovering Recurrent Events in Multi-channel Data Streams  using Unsupervised Methods

Naphade Li & Huang, NGDM 02 3

Multimedia Semantics

The Semantics of Contents Objects, Sites and Events of Interest in the Video

(ICIP 02)

The Semantics of Context The Semantics of Structure/Recurrence

Scenes Context Changes Recurring Temporal Patterns Structural Syntax

Page 4: Discovering Recurrent Events in Multi-channel Data Streams  using Unsupervised Methods

Naphade Li & Huang, NGDM 02 4

State of the ART

• Content Analysis:• Image/Video Classification: Naphade (UIUC), Vailaya (Michigan

State), Iyengar & Vasconcelos (MIT), Smith (IBM) • Semantic Audiovisual Analysis: Naphade (UIUC), Chang (Columbia).

• Learning and Multimedia:• Statistical Media Learning: Naphade (UIUC), Forsyth (Berkeley),

Fisher & Jebara (MIT), V. Iyengar (IBM).• Learning in Image Retrieval: Chang et al. (UCSB), Zhang et al

(Microsoft Research), Naphade et al. (UIUC) Viola et al. (MIT, MERL). • Linking Clusters in Media Feature: Barnard & Forsyth (Berkeley),

Slaney (IBM).• Vision and Speech:

• Computer Vision in Media Analysis: Bolle (IBM), Mallik (Berkeley)• Auditory Scene Analysis & Discriminant ASR Models: Ellis

(MIT), Nadas et al. (IBM), Gopalkrishnan et al (IBM), Woodland et al. (Cambridge), Naphade et al (UIUC) Wang et al (NYU), Kuo et al. (USC)

Page 5: Discovering Recurrent Events in Multi-channel Data Streams  using Unsupervised Methods

Naphade Li & Huang, NGDM 02 5

Media Learning: A Perspective

Semantics

Query by Example,

Relevance Feedback,

Unsupervised Segmentation

Boosting

SVM, NN, GMM, HMM-based classification, Multijects, Multinet,

Supervised Segmentation, ASR, CASA

Future of Multimodal

Mining

• More Supervision More Semantics • Semi-Autonomous Learning Clever techniques for

supervision that reduce amount of user input

Sup

ervi

sion

Page 6: Discovering Recurrent Events in Multi-channel Data Streams  using Unsupervised Methods

Naphade Li & Huang, NGDM 02 6

Extracting Semantics: What Options?

Manual Semi-automatic

Most accurateMost time consuming

ExpensiveStatic

PossibleAdaptive

Challenging

Future: For this to be possible and useful needAutonomous Learning.

Fully Automated

Challenge: In this realm use “intelligence” and “learning” to move from left to right without compromising on performance.

Signals Features Semantics

Past Today Goal

Autonomous and User Friendly

Page 7: Discovering Recurrent Events in Multi-channel Data Streams  using Unsupervised Methods

Naphade Li & Huang, NGDM 02 7

Challenges of Multimedia Learning

Problem ApproachTremendous variability and uncertainty Framework must take uncertainty into

account

Small number of training examples (relative to feature dimensionality)

Exhaustive training techniques such as those for ASR not possible

Complex distributions, highly non-linear decision boundaries, high-dimensional feature spaces

Employ feature selection and dimensionality reduction. Linear classifiers not sufficient.

Manual annotation is time-consuming expensive, human barrier

Learning needs to be user-centric

Dependence on a host of scientific disciplines for extracting good features, none of which have been perfected

Must get around imperfect segmentation, single-channel auditory non-separability

Multiple Channels with possible relationships that are unknown

Need to fuse information

• Challenging problems not easily addressed by traditional approaches.

Page 8: Discovering Recurrent Events in Multi-channel Data Streams  using Unsupervised Methods

Naphade Li & Huang, NGDM 02 8

Media Learning: Proposed Architecture

Retrieval/Summarization

Fusion

Feedback

AudioFeatures

VisualFeatures

Segmentation

Annotation

Multimedia Repository

Audio Models

SpeechModels

VisualModels

Learning modelsfeatures

Active Sample Selection

Knowledge Repository

GranularityResolution

Active Learning

Multiple Instance Learning

Graphical Models for Decision Fusion

Discovering Structures and Recurring Patterns

SVM, GMM, HMM

Page 9: Discovering Recurrent Events in Multi-channel Data Streams  using Unsupervised Methods

Naphade Li & Huang, NGDM 02 9

Detecting the Semantics of Structure

Examples News: The Anchor Person Sports e.g. Baseball: Homerun, Pitch, Strike-Out Talk-shows: Monologue, Laughter, Applause, Music Movies: e.g. Action Movies: Explosions, Gunshots.

Challenges Mapping features to semantics. Evaluating a finite set of predefined hypotheses. Granularity: Structure exists at different granularities. Multimodal Fusion.

Page 10: Discovering Recurrent Events in Multi-channel Data Streams  using Unsupervised Methods

Naphade Li & Huang, NGDM 02 10

Related Literature

Early Use of HMMs for capturing stationarity and transition and its application to clustering: A. B. Poritz, Levenson et al.

Scene Segmentation (using HMMs): Wolf, Ferman & Tekalp; Kender & Yeo; Liu, Huang & Wang; Sundaram and Chang, Divakaran & Chang.

Multimodal scene similarity: Nakamura & Kanade; Nam Cetin & Tewfik; Naphade, Wang & Huang; Srinivasan, Ponceleon; Amir and Petkovic; Adams et al.

Page 11: Discovering Recurrent Events in Multi-channel Data Streams  using Unsupervised Methods

Naphade Li & Huang, NGDM 02 11

Ergodic HMMs

Poritz showed how an ergodic model could capture repetitive patterns in the speech signals through unsupervised clustering.

1 2

3

A Possible State Sequence:

1

Page 12: Discovering Recurrent Events in Multi-channel Data Streams  using Unsupervised Methods

Naphade Li & Huang, NGDM 02 12

Ergodic HMMs

1 2

3

A Possible State Sequence:

1 1

Poritz showed how an ergodic model could capture repetitive patterns in the speech signals through unsupervised clustering.

Page 13: Discovering Recurrent Events in Multi-channel Data Streams  using Unsupervised Methods

Naphade Li & Huang, NGDM 02 13

Ergodic HMMs

Poritz showed how an ergodic model could capture repetitive patterns in the speech signals through unsupervised clustering.

1 2

3

A Possible State Sequence:

1 1 2

Page 14: Discovering Recurrent Events in Multi-channel Data Streams  using Unsupervised Methods

Naphade Li & Huang, NGDM 02 14

Ergodic HMMs

Poritz showed how an ergodic model could capture repetitive patterns in the speech signals through unsupervised clustering.

1 2

3

A Possible State Sequence:

1 1 2 3

Page 15: Discovering Recurrent Events in Multi-channel Data Streams  using Unsupervised Methods

Naphade Li & Huang, NGDM 02 15

Ergodic HMMs

Poritz showed how an ergodic model could capture repetitive patterns in the speech signals through unsupervised clustering.

1 2

3

A Possible State Sequence:

1 11 2 3

Page 16: Discovering Recurrent Events in Multi-channel Data Streams  using Unsupervised Methods

Naphade Li & Huang, NGDM 02 16

Ergodic HMMs

Poritz showed how an ergodic model could capture repetitive patterns in the speech signals through unsupervised clustering.

1 2

3

A Possible State Sequence:

1 11 2 3

Page 17: Discovering Recurrent Events in Multi-channel Data Streams  using Unsupervised Methods

Naphade Li & Huang, NGDM 02 17

Non Ergodic HMMs

Transition from any state to any other state not permitted as in the Ergodic Case

1 2 3

A Possible State Sequence:

1

Page 18: Discovering Recurrent Events in Multi-channel Data Streams  using Unsupervised Methods

Naphade Li & Huang, NGDM 02 18

Non Ergodic HMMs

Transition from any state to any other state not permitted as in the Ergodic Case

1 2 3

A Possible State Sequence:

1 1

Page 19: Discovering Recurrent Events in Multi-channel Data Streams  using Unsupervised Methods

Naphade Li & Huang, NGDM 02 19

Non Ergodic HMMs

Transition from any state to any other state not permitted as in the Ergodic Case

1 2 3

A Possible State Sequence:

1 1 2

Page 20: Discovering Recurrent Events in Multi-channel Data Streams  using Unsupervised Methods

Naphade Li & Huang, NGDM 02 20

Non Ergodic HMMs

Transition from any state to any other state not permitted as in the Ergodic Case

1 2 3

A Possible State Sequence:

1 1 2 3

Page 21: Discovering Recurrent Events in Multi-channel Data Streams  using Unsupervised Methods

Naphade Li & Huang, NGDM 02 21

Capturing Short Term Stationarity and Long-Term Structure

1 2 3

1 2 3

1 2 3

DD

• Each branch: non-ergodic

• All branches embedded in a hierarchical ergodic structure

Page 22: Discovering Recurrent Events in Multi-channel Data Streams  using Unsupervised Methods

Naphade Li & Huang, NGDM 02 22

Capturing Short Term Stationarity and Long-Term Structure

1 2 3

D D1 2 3

1 2 3

• Each branch: non-ergodic

• All branches embedded in a hierarchical ergodic structure

Page 23: Discovering Recurrent Events in Multi-channel Data Streams  using Unsupervised Methods

Naphade Li & Huang, NGDM 02 23

Experimental Setup

DomainsAction Videos (20 clips from “Specialist”)Late Night Shows (20 min of Dave Letterman)

FeaturesVisual (30 frames/sec)

Color (HSV histogram)Structure (Edge Direction histogram)

Audio (30 audio frames/sec to sync with video)32 Mel Frequency Cepstral Coefficients (10 ms

overlap)

Page 24: Discovering Recurrent Events in Multi-channel Data Streams  using Unsupervised Methods

Naphade Li & Huang, NGDM 02 24

Results: Recurring Patterns in Video

Movie: Specialist

Discovered Recurring Pattern: Explosion

S1_d10.asx

Page 25: Discovering Recurrent Events in Multi-channel Data Streams  using Unsupervised Methods

Naphade Li & Huang, NGDM 02 25

Results: Recurring Patterns in Video

Late night Show with Dave Letterman

Discovered Patterns: Applause, Laughter, Speech, Music

latenight_s1_d100.asx

latenight_s16_d15.asx

LaughterApplause

Page 26: Discovering Recurrent Events in Multi-channel Data Streams  using Unsupervised Methods

Naphade Li & Huang, NGDM 02 26

Observations

Completely UNSUPERVISED In case of recurring temporal event patterns this scheme is

capable of discovering them if there is a sufficient number of these patterns in the set.

In case of repetitive anchoring events such as Applause in Comedy Shows, scheme capable of discovering these events.

Segmentation and Pattern Discovery very helpful in annotation. E.g. to manually annotate Dave Letterman’s jokes, just look before the applause

Anyone who has done manual audio annotation knows how useful it is to get the right segment boundaries especially at the micro and macro level.

Page 27: Discovering Recurrent Events in Multi-channel Data Streams  using Unsupervised Methods

Naphade Li & Huang, NGDM 02 27

Summary

Problem: Automatic discovery of recurring temporal patterns

without supervision. Approach:

Clustering: Use of unsupervised temporal clustering using a hierarchical ergodic model with non-ergodic temporal pattern models

Interaction: User then needs to analyze only the extracted recurring set to quickly propagate annotation.

Results: Automatic extraction of recurring patterns (laughter, explosion,

monologue etc.) and regular structure Near-complete elimination of manual annotation. Orders of magnitude

reduction in annotation of clusters than annotation of content.

Page 28: Discovering Recurrent Events in Multi-channel Data Streams  using Unsupervised Methods

Naphade Li & Huang, NGDM 02 28

Future Directions

Experiment with different non-ergodic branches as well as across branch transitions

Use this to bootstrap training of semantic events that can be detected using HMMs/DBNs (ICIP 98, NIPS 2000).

Explore visual features extracted regionally to model richer class of recurring patterns.

Experimenting with the Sports Domain (possible interaction with Prof. Chang and his group)