graph based machine learning with applications to media...

67
Graph based machine learning with applications to media analytics Lei Ding, PhD 9-1-2011 with collaborators at

Upload: others

Post on 24-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Graph based machine learning��� with applications to media analytics ���

��� Lei Ding, PhD

9-1-2011

with collaborators at

Page 2: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Outline

•  Graph based machine learning –  Basic structures –  Algorithms –  Examples

•  Applications in media analytics –  Social analysis of videos –  Content analysis of images

Page 3: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Outline

•  Graph based machine learning –  Basic structures –  Algorithms –  Examples

•  Applications in media analytics –  Social analysis of videos –  Content analysis of images

Page 4: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

What is a graph

Not the graph we are going to talk about

Page 5: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

•  A graph is composed of –  Vertices (nodes): pixels, actors in videos, genes, ads, etc. –  Edges: their relations –  In machine learning, we are interested in predicting some quantity

(a class label, or a continuous value) at each unlabeled vertex

What is a graph

Page 6: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

•  A graph is composed of –  Vertices (nodes): pixels, actors in videos, genes, ads, etc. –  Edges: their relations –  In machine learning, we are interested in predicting some quantity

(a class label, or a continuous value) at each unlabeled vertex •  Broadly speaking, there are two kinds of graphs

What is a graph

undirected directed

Page 7: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Graph based machine learning for media analytics

•  Oftentimes, media content can be represented using graphs •  Therefore, challenging inference problems with media content

can be answered by learning on graphs

Page 8: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Social content model

Content generation process

Content network encodes content similarity (videos, audios, etc.)

Social network encodes peoples’ social connections

Can be used for media genre classification, media recommendation, etc.

Page 9: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Graph based machine learning

•  On undirected graphs –  Optimization based approaches (e.g. energy minimization) –  Probabilistic models (e.g. random fields)

•  On directed graphs –  Optimization based approaches (e.g. directed energy minimization) –  Probabilistic models (e.g. latent Dirichlet allocation, Bayesian networks)

Page 10: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Relations •  How are they related to traditional stats learning (e.g. logistic regression)

(Sutton & McCallum, 2007)

Page 11: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Graph based machine learning

•  On undirected graphs –  Optimization based approaches (e.g. energy minimization) –  Probabilistic models (e.g. random fields)

•  On directed graphs –  Optimization based approaches (e.g. directed energy minimization) –  Probabilistic models (e.g. latent Dirichlet allocation, Bayesian networks)

Page 12: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Learning on undirected graphs

•  Classification methods –  We have some labeled data, and

want to predict labels for others –  e.g. manifold regularization

•  Clustering methods –  We would like to partition data

into clusters –  e.g. spectral clustering

Page 13: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Constructing data graphs

•  How to transform a dataset ({xi}, i=1..m) into a graph

Page 14: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Affinity matrix •  A graph is usually represented using an affinity matrix W,

where the corresponding entry is 1 if two vertices are connected, and 0 otherwise

Page 15: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Graph Laplacians

•  L=D-W, where W is an affinity matrix, D is a diagonal matrix of row sums

•  Discretization of Laplace-Beltrami operator on manifolds, which is the sum of second order derivatives on tangent space (more details later)

Page 16: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Function on graph

•  A vector can be used to represent a function over the graph –  We can encode what we already know or what we want to predict in a

label function –  For example in this graph, a vertex can represent a person, and the

function can represent if he is a likely customer

0

1

1

1 0

0

f = [ 1, 1, 0, 0, 1, 0 ] T

Page 17: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Eigenvectors reviewed

Page 18: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Properties of graph Laplacians

•  Symmetric and positive semi-definite •  Graph Laplacian induces a smoothness term

–  Transposed label function f * Laplacian matrix L * label function f (always non-negative)

–  Smoothness term (fTLf) measures how much the function f varies with respect to the underlying graph

–  We have labels on some vertices, and want to predict labels on other vertices. A smooth function (small fTLf) typically predicts well

•  Laplacian eigenvectors with small eigenvalues can be used for data clustering / classification, data set parametrization, image segmentation, etc.

Page 19: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Properties of graph Laplacians

•  Symmetric and positive semi-definite •  Graph Laplacian induces a smoothness term

–  Transposed label function f * Laplacian matrix L * label function f (always non-negative)

–  Smoothness term (fTLf) measures how much the function f varies with respect to the underlying graph

–  We have labels on some vertices, and want to predict labels on other vertices. A smooth function (small fTLf) typically predicts well

•  Laplacian eigenvectors with small eigenvalues can be used for data clustering / classification, data set parametrization, image segmentation, etc.

Now we are ready to see the algorithms, but let’s take a little break to understand things even further

Page 20: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Manifolds

Page 21: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Manifold perspective of data modeling

Page 22: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Why graphs encode underlying data geometry

If we consider data as samples from an underlying manifold (which is a fairly weak assumption), and construct the corresponding adjacency graph, then eigenvectors of graph Laplacian approximate eigenfunctions of the Laplace-Beltrami operator of the underlying data manifold

(Belkin & Niyogi, 2008)

Page 23: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Laplacian eigenvectors “understand” geometry

(Rustamov, 2007)

Page 24: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Spectral clustering

More information in von Luxburg (2007)

Page 25: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Spectral clustering explained •  Why the eigenvectors of L with small eigenvalues are used as the new

representation? •  The minimizers fi for the following total smoothness term are eigenvectors

of L with the smallest eigenvalues

Page 26: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Results

Page 27: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Laplacian eigenmap •  Using Laplacian eigenvectors with the smallest eigenvalues as

the new representation •  Can be seen as a non-linear extension of PCA

(Belkin & Niyogi, 2003)

Page 28: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Results on real data •  Transform data using Laplacian eigenmap, and use linear

regression on the new representation

(Belkin & Niyogi, 2004)

Page 29: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Manifold regularization

•  A comprehensive regularization framework

•  Through applying the representer theorem in functional analysis, the optimal solution is as follows

(Belkin et al., 2006)

Page 30: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Results on real data

(Belkin et al., 2006)

Page 31: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Summary

•  Learning on graphs provides a set of powerful techniques for data analysis and predictive analytics that “understand” the geometry of underlying data

•  Spectral clustering – addresses the limitation with traditional K-means

•  Laplacian eigenmap & manifold regularization – learn a label function respecting underlying data geometry, and hence provide benefits over standard methods like PCA and linear regression

•  Lots of other approaches as well – will talk about label propagation based on graphs later in this presentation

Page 32: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Outline

•  Graph based machine learning –  Basic structures –  Algorithms –  Examples

•  Applications in media analytics –  Social analysis of videos –  Content analysis of images

Page 33: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

High-level analysis Social relational inference

Applications in media analytics

Mid-level analysis Event detection

Low-level analysis Segmentation Pixels to semantic objects

People to communities

Visual features to events

Page 34: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Application 1: social analysis of multimedia data

Friends or foes? Acquaintances or strangers? In same or different teams?

Page 35: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Social network learning and analysis

Page 36: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Social network learning and analysis

Page 37: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Social network learning and analysis

(Ding & Yilmaz, 2010; 2011)

Page 38: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Application areas

•  Social content: given the growing popularity of social media, inferring relations among people is becoming important

•  Visual recognition: social context is shown to help improve recognition results from images (e.g. Wang et al., ECCV 10)

•  Surveillance: social network learning and analysis for surveillance applications (e.g. Yu et al., CVPR 2009)

•  Sociology: necessary step in building intelligent systems for aiding sociological discovery

Page 39: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Basic video processing

•  Videos segmented into semantic segments –  Scenes, or visually coherent sets of shots, for movies and TV shows –  Shot detection and merging based on key-frame similarity (Rasheed

& Shah, 03)

•  Identifying the actors appearing in each segment –  Using scripts and closed captions for movies –  Face detection and recognition for other videos

Page 40: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Actor appearance matrix

Page 41: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Overall process

Feature observations

Grouping cues

Event estimates

Scene models

Social Relations

frame-level

scene-level

video-level

A number [-1,+1] for each scene: positive if actors in a scene are likely in the same community, negative if otherwise

Estimate the likely events in a scene

Dynamic systems represent scenes

Page 42: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Key steps

Page 43: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Visual features •  Generic optical flow orientation histogram

Page 44: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Auditory features

Page 45: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Using visual concepts •  Visual concept detection provides useful semantic features for inferring

social relations •  Using Columbia’s 374 SVM concept detectors on color/texture/edge

features, a concept score vector is generated for each scene

Page 46: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Evidence synthesis by Gaussian processes

Page 47: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Learned social affinity

�  Learned social network is represented by affinity matrix K

Page 48: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Learned social networks

Page 49: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

RACOM dataset

•  Ten example movies: (1) G.I. Joe: The Rise of Cobra (2009); (2) Harry Potter and the Half-Blood Prince (2009); (3) Public Enemies (2009); (4) Troy (2004); (5) Braveheart (1995); (6) Year One (2009); (7) Coraline (2009); (8) True Lies (1994); (9) The Chronicles of Narnia: The Lion, the Witch and the Wardrobe (2005); (10) The Lord of the Rings: The Return of the King (2003) .

Page 50: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Analyzing social networks •  We extend the max-min modularity principle such that it works with the

learned social networks, in order to detect the two communities for each movie

•  We also identify the leaders of each community, which interestingly, correspond to the hero/villain most of the time

Page 51: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Max-min modularity

Page 52: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Visual maps

Page 53: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Quantitative evaluation

Page 54: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Detected social communities

Page 55: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Youtube dataset

•  10 videos for soccer games; 10 videos for demonstration; •  The goal here is to predict a grouping cue for each scene.

We evaluate against ground truth labeling

Page 56: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Youtube results •  Event categories are considered and labeled in a middle step

–  Soccer: (chasing, confronting, hugging, others) –  Demonstration: (marching, confronting, public speaking, others)

•  Precision (+) for within-community instances and Precision (-) for across-community instances are reported separately

Page 57: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

•  Interactive whole-object segmentation –  Inputs: an image & labeled pixels (seeds) for objects/background –  Outputs: labels for all other pixels

Application 2: image content analysis

(Ding & Yilmaz, 2010)

Page 58: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Overview •  To segment whole objects from images given user-supplied seeds

–  Different from unsupervised segmentation from a single image, which typically generates homogeneous regions

–  The challenge is to segment objects using a small number of seeds •  In addressing this problem, we have proposed

–  Probabilistic hypergraph image model (PHIM) –  Automatic label set augmentation using boundary features –  Multiple view learning synthesizing features

Page 59: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

•  Graph based approaches have been popular for interactive segmentation –  Graph cut (Rother et al., 2004) –  Random walk (Grady, 2006)

•  Hypergraphs vs. graphs for images –  Higher order relations among pixels that tend to form a segment are

encoded as hyperedges, which are collections of vertices –  Model long-range dependencies among the entities (known and unknown

labels)

Graphs vs. hypergraphs

Page 60: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

•  We propose to use probabilistic hypergraph image model (PHIM) –  The relation between a hyperedge and a vertex is probabilistic, based

on probabilities learned from image appearance characteristics

•  Vertices: superpixels •  Hyperedges: pair-wise + higher-order (generated by mean-

shift weak segmentation with varying color bandwidths)

Our model: PHIM

Page 61: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Our model: PHIM (cont’d) •  Feature vector Fs of a superpixel s contains average LUV color values •  Incidences: kernel density estimator taking superpixel features as the input

•  Hyperedge weights: inhomogeneous hyperedges are down-weighted –  Reduces to standard graph based edge weights when the hyperedge is of

size 2

Page 62: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Laplacians on PHIM •  Normalized Laplacians on PHIM: induced quadratic form measures the

smoothness of a function with respect to the underlying edge system –  We use probabilistic incidences (hv,e) in defining Laplacians on PHIM

•  Notations –  f: vector of function values on vertices (+1 for object; -1 for

background) –  H: probabilistic incidence matrix; W: hyperedge weight matrix –  De: hyperedge degree matrix; Dv: vertex degree matrix

Page 63: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

How to do segmentation

•  Constrained smoothness minimization –  Essentially an interpolation, as we have confidence in user-supplied

segment labels

•  This interpolation can also be solved in an iterative manner using the natural random walk

Page 64: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Dataset •  GrabCut dataset of 50 images (Rother et al., 2004) •  Seed pixels are provided in the form of trimaps •  Ground-truth segmentations are supplied

Page 65: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Results on segmentation •  Error rates averaged over the GrabCut dataset of 50 images

–  PHIM performs better than a standard graph –  Our error rate 5.33% is much better than 7.9% achieved in (Blake et al.,

2006), and is comparable to state-of-the-art results from pixel-level optimization

Page 66: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

Comparative results

Page 67: Graph based machine learning with applications to media …files.meetup.com/1542972/GraphBasedMachineLearning.pdf · 2011-09-02 · – Shot detection and merging based on key-frame

The end •  Thanks! •  References

–  Ulrike von Luxburg, A Tutorial on Spectral Clustering, 2007 –  Charles Sutton and Andrew McCallum, An Introduction to Conditional Random Fields for

Relational Learning, 2007 –  Raif Rustamov, Laplace-Beltrami Eigenfunctions for Deformation Invariant Shape Representation,

2007 –  Mikhail Belkin and Partha Niyogi, Laplacian Eigenmaps for Dimensionality Reduction and Data

Representation, 2003 –  Mikhail Belkin and Partha Niyogi, Semi-Supervised Learning on Riemannian Manifolds, 2004 –  Mikhail Belkin, Partha Niyogi and Vikas Sindwani, Manifold Regularization: A Geometric

Framework for Learning from Labeled and Unlabeled Examples, 2006 –  Mikhail Belkin and Partha Niyogi, Convergence of Laplacian Eigenmaps, 2008 –  Lei Ding and Alper Yilmaz, Learning Relations Among Movie Characters: A Social Network

Perspective, 2010 –  Lei Ding and Alper Yilmaz, Interactive Image Segmentation Using Probabilistic Hypergraphs,

2010 –  Lei Ding and Alper Yilmaz, Inferring Social Relations from Visual Concepts, 2011