final ir lab project - carnegie mellon...
TRANSCRIPT
1
� � � � � � � � � � � � � � � �
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � �
Wei-Hao LinLanguage Technologies Institute
School of Computer ScienceCarnegie Mellon University
� � � � � � �
� Problem Definition� Motivations and Related Works� Local-to-Global Functions� Experiments� Conclusions
2
� � � � � � � � � � � � � � � � � � � � � � � � � � � � �
Find video segments containing “Madeleine Albright”
CNN
ABC
C-SPAN
� � � � � � � � � � � � � � � � � � �
Merging Method 1
Merging Method 2
Merging Method 3
3
� � � � � � � � � � �
� There may exist source-specific characteristics to be exploited.
� Training on full data may be more expensive than training on source-by-source basis.
� Not all source provide may provide the same type of features, which makes training on full data impossible.
� � � � � � � �
� Unlike meta-search, there is no overlapping documents between sources; every document (shot) appears only once in one of the rank lists.
� Unlike distributed IR environment, all information about rank lists are available.
� Text-specific method will not be applicable in multimedia domain.
4
� � � � � � � � � � � � � �
� Obtain the local score from individual source Li(d)
� Map the local score to the global score g(d) = m( Li(d) )
� Sort the final list in the decreasing order of g(d)
L1(di)
L2(di)
g(di)
g(di)
S1 S2
Merged List
� � � � � � � � � � � � � � � � � � � � �
� No mapping functions, that is, local scores are equivalent to global scores• Rank-based local score functions
• Round Robin
• Score-based local score functions• Raw Score, Linear Scaling
� Learning mapping functions• Learning with labels• Learning with optimal rank lists
5
� � � � � � � � � �
� Work best when classifiers perform similarly in terms of ranking ability.
� � � � �
� Work best when all sources are classified using the same classification algorithm.
6
� � � � � � � � � � � �
Crude way to normalize local scores. Work best when local scores only differ
in scales.
� � � � � � � � � � � � � � � � � � � � � � � �
� Logistics regression is the mapping function.� Transform the scores into (calibrated)
probabilities.� Parameter estimation:
• X: local scores from each test fold of 5-fold cross-validation on training set
• Y: binary labels
7
� � � � � � � � � � � � � � � � � � � �
� � � � � � � � � � � � � �
� Remove the noise from positive shots that are ranked low.
� Parameter estimation:• X: local scores from each test fold of the 5-fold cross-
validation on training set• Y: if d is in the first α%,of the optimal rank lists, use its
label; otherwise set the label to -1 (negative examples).
� � � � � � � � � � � � � � � � � � �
� The upper bound of MAP that merged rank lists from multiple sources under order preservation
� The optimal lists can be formulated as a search problem.
� The optimal list is approximated using greedy search here due to the limitation of computation resources.
8
� � � � � � � � � � � � � � � � � � �
R2
R3
S1 S2Start
R1 R2
R2
R3
R1
R1R3
R3R1
� � � � � � � � � � � �
� TREC 2003 Video Track• ~120 hours of ABC World News Tonight and
CNN Headline News from late Jan. to Jun. 1998
• ~13 hours of C-SPAN Programs from 1998 –2001
9
� � � � � � � � � � � � � � � � �
� Weather News• shot reports on the weather
� Sporting Event• shot contains video of one or more organized
sporting events
� � � � � � � � � � � � � � � � � � � � � � � �
� Closed captions, aligned with the shot boundary using speech recognition transcripts• Vocabulary size: 6441• Median document length: ~9
� Stopword removal, Porter stemming� Feature vector
• Term frequency of each word in the vocabulary� SVM classifier with linear kernel
10
� � � � � � � � � � � � � � � � � � � � � � � �
� Color in HVC color space; only H & C are used.
� 125-bin histogram, 5 by 5 grids� Feature vector
• the mean and variance of each grid� SVM Classifier
� � � � � � � � � � � � � � � � � � � � � � � �
� News reports and sports news usually come with strong timing cues.
� Training data are not well-annotated, making discriminated learning harder.
� Modeling class density using kernel density estimation with Sheather-Jones bandwidth selection method.
� Bayesian Classifier
11
� � � � � � � � � � � � � � � � � � �
� � � � � � � � � � � � � � � � �
Time
Labe
l
0.0 0.2 0.4 0.6 0.8 1.0
Neg
ativ
eP
ositi
ve
� � � � � � � � � � � �
0.0 0.2 0.4 0.6 0.8
05
1015
density(x = subset(dat, label == "+1")$middle, width = "SJ")
N = 215 Bandwidth = 0.00759
Den
sity
12
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �
� MAP in the above rank list: (1/2 + 2/5) / 2 = 0.45� Between ~0 and 1� The higher, the better.� Favor high-ranked items.
1/2 2/5
� � � � � � � � � � � �
� Given m relevant items in the list of the n items• Randomly assign the m items, calculate MAP,
and repeat. Take average.
• Calculate E[AP] directly
13
� � � � � � � � � � � � � �
� � � � � � � � � � � � � � � �
� Which merging method works best?� Under what condition does one method
perform better than others?� Is merging rank lists always better than
putting-everything-together?
14
� � � � � � � � � � �
� � � � � � � � � � �
15
� � � � � � � � � � � � �
� � � � � � � � � � � � �
16
� � � � � � � � � � � � �
� � � � �
17
� � � � � � � � � � � � �
� � � � � � � � � � � � �
18
� � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � �
� � � � � � � � � �
� Merging methods matter.� Logisreg with optimal merged rank list
and Logisreg with labels are among the best methods.
� Raw score only works when classifiers are the same across sources.
� Merging helps when there exist source-specific characteristics.