final ir lab project - carnegie mellon...

1

� � � � � � � � � � � � � � � �

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

Wei-Hao LinLanguage Technologies Institute

School of Computer ScienceCarnegie Mellon University

[email protected]

� � � � � � �

� Problem Definition� Motivations and Related Works� Local-to-Global Functions� Experiments� Conclusions

2

� � � � � � � � � � � � � � � � � � � � � � � � � � � � �

Find video segments containing “Madeleine Albright”

CNN

ABC

C-SPAN

� � � � � � � � � � � � � � � � � � �

Merging Method 1

Merging Method 2

Merging Method 3

3

� � � � � � � � � � �

� There may exist source-specific characteristics to be exploited.

� Training on full data may be more expensive than training on source-by-source basis.

� Not all source provide may provide the same type of features, which makes training on full data impossible.

� � � � � � � �

� Unlike meta-search, there is no overlapping documents between sources; every document (shot) appears only once in one of the rank lists.

� Unlike distributed IR environment, all information about rank lists are available.

� Text-specific method will not be applicable in multimedia domain.

4

� � � � � � � � � � � � � �

� Obtain the local score from individual source Li(d)

� Map the local score to the global score g(d) = m( Li(d) )

� Sort the final list in the decreasing order of g(d)

L1(di)

L2(di)

g(di)

g(di)

S1 S2

Merged List

� � � � � � � � � � � � � � � � � � � � �

� No mapping functions, that is, local scores are equivalent to global scores• Rank-based local score functions

• Round Robin

• Score-based local score functions• Raw Score, Linear Scaling

� Learning mapping functions• Learning with labels• Learning with optimal rank lists

5

� � � � � � � � � �

� Work best when classifiers perform similarly in terms of ranking ability.

� � � � �

� Work best when all sources are classified using the same classification algorithm.

6

� � � � � � � � � � � �

Crude way to normalize local scores. Work best when local scores only differ

in scales.

� � � � � � � � � � � � � � � � � � � � � � � �

� Logistics regression is the mapping function.� Transform the scores into (calibrated)

probabilities.� Parameter estimation:

• X: local scores from each test fold of 5-fold cross-validation on training set

• Y: binary labels

7

� � � � � � � � � � � � � � � � � � � �

� � � � � � � � � � � � � �

� Remove the noise from positive shots that are ranked low.

� Parameter estimation:• X: local scores from each test fold of the 5-fold cross-

validation on training set• Y: if d is in the first α%,of the optimal rank lists, use its

label; otherwise set the label to -1 (negative examples).

� � � � � � � � � � � � � � � � � � �

� The upper bound of MAP that merged rank lists from multiple sources under order preservation

� The optimal lists can be formulated as a search problem.

� The optimal list is approximated using greedy search here due to the limitation of computation resources.

8

� � � � � � � � � � � � � � � � � � �

R2

R3

S1 S2Start

R1 R2

R2

R3

R1

R1R3

R3R1

� � � � � � � � � � � �

� TREC 2003 Video Track• ~120 hours of ABC World News Tonight and

CNN Headline News from late Jan. to Jun. 1998

• ~13 hours of C-SPAN Programs from 1998 –2001

9

� � � � � � � � � � � � � � � � �

� Weather News• shot reports on the weather

� Sporting Event• shot contains video of one or more organized

sporting events

� � � � � � � � � � � � � � � � � � � � � � � �

� Closed captions, aligned with the shot boundary using speech recognition transcripts• Vocabulary size: 6441• Median document length: ~9

� Stopword removal, Porter stemming� Feature vector

• Term frequency of each word in the vocabulary� SVM classifier with linear kernel

10

� � � � � � � � � � � � � � � � � � � � � � � �

� Color in HVC color space; only H & C are used.

� 125-bin histogram, 5 by 5 grids� Feature vector

• the mean and variance of each grid� SVM Classifier

� � � � � � � � � � � � � � � � � � � � � � � �

� News reports and sports news usually come with strong timing cues.

� Training data are not well-annotated, making discriminated learning harder.

� Modeling class density using kernel density estimation with Sheather-Jones bandwidth selection method.

� Bayesian Classifier

11

� � � � � � � � � � � � � � � � � � �

� � � � � � � � � � � � � � � � �

Time

Labe

l

0.0 0.2 0.4 0.6 0.8 1.0

Neg

ativ

eP

ositi

ve

� � � � � � � � � � � �

0.0 0.2 0.4 0.6 0.8

05

1015

density(x = subset(dat, label == "+1")$middle, width = "SJ")

N = 215 Bandwidth = 0.00759

Den

sity

12

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

� MAP in the above rank list: (1/2 + 2/5) / 2 = 0.45� Between ~0 and 1� The higher, the better.� Favor high-ranked items.

1/2 2/5

� � � � � � � � � � � �

� Given m relevant items in the list of the n items• Randomly assign the m items, calculate MAP,

and repeat. Take average.

• Calculate E[AP] directly

13

� � � � � � � � � � � � � �

� � � � � � � � � � � � � � � �

� Which merging method works best?� Under what condition does one method

perform better than others?� Is merging rank lists always better than

putting-everything-together?

14

� � � � � � � � � � �

� � � � � � � � � � �

15

� � � � � � � � � � � � �

� � � � � � � � � � � � �

16

� � � � � � � � � � � � �

� � � � �

17

� � � � � � � � � � � � �

� � � � � � � � � � � � �

18

� � � � � � � � � � � � � � � � � � � � � � � � � ��

� � � � � � � � � �

� Merging methods matter.� Logisreg with optimal merged rank list

and Logisreg with labels are among the best methods.

� Raw score only works when classifiers are the same across sources.

� Merging helps when there exist source-specific characteristics.

final ir lab project - carnegie mellon...

Documents