commentary-based video categorization and concept discovery by janice leung

40
Commentary-based Video Categorization and Concept Discovery By Janice Leung

Post on 20-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Commentary-based Video Categorization and Concept Discovery

By Janice Leung

Agenda

Introduction to Video Sharing Sites Current Problem Previous Works Commentary-based Video Clustering Conclusion Future Works

Video Sharing Sites

Allows users to upload videos Shares videos worldwide Example:

Dailymotion YouTube MySpace

De Facto

YouTube More than 65,000 new videos every day 100 million videos views daily 20 million unique visitors per month

Immense amount of videos

Incredible growth of videos

How to search for desired video?

YouTube: Tags + simple Categorization

YouTube

Predefined categories Videos

Title Description Tags Category Comments

Provided by the one who uploads the video

Provided by many users

Related Works

Classify videos: Video features: color, grayscale

histogram, pixel information Keywords from description Tags

Find user interests: Object fetching information Tags

Problems

Video features Cannot tell exactly what the video is

about No users interest is considered

Keywords from description Description provided by the one who

uploaded the video Not sufficient information

Problems (Cont.)

Tags Not sufficient information May reflect users feelings on videos but

too brief to represent the complex idea of the videos

Object fetching information Reflects users interests but no

information about the videos at all

Video Categorization and Concept Discovery

Site: YouTube Videos: involving Hong Kong singers

Comment vs Tag

Comments Given by many users

Can be large amount Express users opinions Rich words describe

fine-grained level ideas

Tags Given by only one

person (the one who uploaded the video)

Few tags Describe the video in a

very brief way Singer name Song name

Comments

Include: Video content

Music styles Music ages

Singer description Appearance Style News etc.

Commentary-based Video Categorization

Objective: Categories videos based on user interests and discover the concept of videos

Cluster videos by using comments Group videos based on user interests Find video concepts Clustering algorithm: multi-assignment

NMF

Video clustering

Bi-clustering: videos and words Clusters videos and words into k

groups by matrix factorization Video-word matrix X as input

Video-word matrix X is derived by tf-idf

Tf-idf

Term frequency (tf) Suppose there are t distinct terms in

document j

where fi,j is the number of occurrence of term i in document j

Tf-idf (Cont.)

Inverse document frequency

where N is the total number of documents in dataset and ni is number of documents containing term i

Tf-idf (Cont.)

Importance weight of term i to document j

Matrix X as input to NMF is defined as

Video Clustering (Cont.)

Decompose X into non-negative matrices W and H by minimizing

where

Ref. : Document Clustering Based On Non-negative Matrix Factorization (Xu et al SIGIR’03)

0, HW

Video Clustering (Cont.)

NMF decomposition for video clustering

Video Clustering (Cont.)

Suppose Number of videos: N Number of distinct terms: M Threshold: β

W in size M x K wn,k: coefficient indicates how video n

belongs to cluster k

Video-cluster assignment

Videos can belongs to multiple groups Multi-cluster assignment Video n belongs to cluster k if Set of clusters that video n belongs to:

where K is set if all clusters

kknw ,

Video-cluster assignment (Cont.)

Threshold, β Many irrelevant videos for each cluster Coefficient distribution varies for different

clusters Coefficient distribution dependant Different for different clusters

Concept Discovery

Matrix H in size of K x M hk,m: how likely term m belongs to

cluster k Term belongs to a cluster describes

the videos in that cluster Concept words of cluster k videos

Top 10 words of cluster k

Experiment

19305 videos 102 Hong Kong singers 7271 users Number of cluster, k: 20

Experiment (Cont.)

Threshold, β Coefficient distribution dependant Threshold for cluster i is defined as

Experiment (Cont.)

Video coefficients may distribute in an extremely uneven manner

Cause poor result To compensate, threshold can be set

as

Experiment (Cont.)

0.700.300.00

0.310.640.05

0.130.220.65

0.010.640.35

0.120.230.65

0.700.300.00

0.310.640.05

0.130.220.65

0.010.640.35

0.120.230.65

Mean Coef.

0.33 0.406 0.254

Mean + SD Coef.

0.631 0.622 0.526

C1 C2 C3

V1

V2

V4

V3

V5

Experiment (Cont.)

Experiment (Cont.)

Experiment (Cont.)

Concept Words vs Tags

Concept Words vs Tags

Percentage of videos with tags covering concept words across groups

Singer Relationship Discovery

Comments on videos may talk about singers

Singer styles, appearance, news Singer clustering using comments Reveals relationships between singers Discovers hidden phenomenon

Singer Relationship Discovery (Cont.)

Conclusion

Captures user interests more accurately and fairly than that of the human predefined categories

Categories can be changed dynamically, user interest changes from time to time

Obtain clusters with fine-grained level ideas Ease the task of video search by

categorizing videos and refining index

Future Works

Extend to user clustering Obtain relationships videos, singers

and users of the entire social network Study the social culture Ease the job of advertising to target

customers Connect people who share the same

interests

Q & A

Questions?