cmu tdt report tides pi meeting 2002

18
CMU TDT Report TIDES PI Meeting 2002 The CMU TDT Team: Jaime Carbonell, Yiming Yang, Ralf Brown, Jian Zhang, Nianli Ma, Chun Jin Language Technologies Institute, CMU

Upload: tad

Post on 07-Jan-2016

33 views

Category:

Documents


1 download

DESCRIPTION

CMU TDT Report TIDES PI Meeting 2002. The CMU TDT Team: Jaime Carbonell, Yiming Yang, Ralf Brown, Jian Zhang, Nianli Ma, Chun Jin Language Technologies Institute, CMU. Time Line for TDT Activities. ReStarted TDT: Summer 2001 Tasks: FSD, SLD, Detection - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CMU TDT Report     TIDES PI Meeting 2002

CMU TDT Report TIDES PI Meeting 2002

The CMU TDT Team:Jaime Carbonell, Yiming Yang, Ralf

Brown, Jian Zhang, Nianli Ma, Chun JinLanguage Technologies Institute, CMU

Page 2: CMU TDT Report     TIDES PI Meeting 2002

Time Line for TDT Activities ReStarted TDT: Summer 2001 Tasks: FSD, SLD, Detection New Techniques: Nov 2001 – Present

Topic-conditional Novelty (FSD) Situated NE’s (all tasks) Source-conditional interpolated training (SLD)

Evaluations TDT: Oct 2001, July 2002 New FSD (internal): July 2002 (KDD

Conference)

Page 3: CMU TDT Report     TIDES PI Meeting 2002

2002 Dry Run Results: DET

Evaluation Conditions Systran

EBMT DICT

SR=nwt+bnasr TE=mul,eng boundary DEF=10

0.3646 0.3465 [1]

SR=nwt+bnasr TE=mul,eng noboundary DEF=10

0.4040

SR=nwt+bnman TE=arb,eng boundary DEF=10

0.2011 0.6799 [2]

0.1966 [3]

SR=nwt+bnman TE=arb,nat boundary DEF=10

0.1732

[1] Using our Mandarin to English EBMT, and replace our boundary with systran’s boundary.[2] Using our Dictionary-Based Arabic to English translation, and with our own boundaries. So the boundaries of evaluation and our results are mismatching. [3] Using our Dictionary-Based Arabic to English translation, and replace our boundary with systran’s boundary.

Page 4: CMU TDT Report     TIDES PI Meeting 2002

Baseline FSD Method (Unconditional) Dissimilarity with

Past Decision threshold on most-similar

story (Linear) temporal decay Length-filter (for teasers)

Cosine similarity with standard weights:

)/log(*))log(1( idfNtftfidf

Page 5: CMU TDT Report     TIDES PI Meeting 2002

2002 Dry Run Results: FSD

Evaluation Conditions

SR=nwt+bnasr; TE=eng, nat;

boundary; DEF=10 0.6174 0.5846

SR=nwt+bnasr; TE=eng, nat; noboundary; DEF=10

0.6899 0.6403

normfsdC )( optimalnormfsdC )(

Page 6: CMU TDT Report     TIDES PI Meeting 2002

2002 Dry Run DET: CMU-FSD

Page 7: CMU TDT Report     TIDES PI Meeting 2002

FSD Observations Cross-site comparable baselines (cost

=.7) “Events-vs-Topics” issue (e.g. Asia crisis) A few mislabled stories wreak havoc for FSD Eager auto-segmentation a problem (misses)

Recommendations for TDT labeling FSD on true events, or events within topic(s) Change auto-segmentation optimality

criterion ?? Recommendations for TDT reserachers

Keep working hard on FSD – not cracked yet

Page 8: CMU TDT Report     TIDES PI Meeting 2002

New FSD Directions Topic-conditional models

E.g. “airplane,” “investigation,” “FAA,” “FBI,” “casualties,” topic, not event

“TWA 800,” “March 12, 1997” event First categorize into topic, then use

maximally-discriminative terms within topic

Rely on situated named entities E.g. “Arcan as victim,” “Sharon as peacemaker”

Page 9: CMU TDT Report     TIDES PI Meeting 2002

Broad Topics vs Events

Page 10: CMU TDT Report     TIDES PI Meeting 2002

Two-level Scheme for FSD

Page 11: CMU TDT Report     TIDES PI Meeting 2002

Confusability between Intra-topic Events

AIRPLANE ACCIDENTS BOMBINGS• Each data point in the matrix is the similarity between the two corresponding documents.

• Documents are sorted by event as the first key and by the time of arrival as second key, so the diagonal sub-matrices are intra-event document similarities, while the off-diagonal sub-matrices are inter-event document similarities.

Page 12: CMU TDT Report     TIDES PI Meeting 2002

Measuring Effectiveness of NEs

[1] f means a Named Entity; Sk the Kth type of Named Entities among seven types of NEs.[2] We use the effectiveness of each type of NEs to measure how well they can differentiate intra-topic events.

Page 13: CMU TDT Report     TIDES PI Meeting 2002

Effectiveness of Named Entities

Page 14: CMU TDT Report     TIDES PI Meeting 2002

Experimental Design Baseline: conventional FSD

Simple case: two-level FSD with “perfect” topic labels

Ideal case: two-level FSD with “perfect” topic labels, weighted NE and removing topic-specific stop words

Real case: the same as Ideal Case except using system-predicted topic labels

Page 15: CMU TDT Report     TIDES PI Meeting 2002

Data Description Broadcast News: published by Primary Source Media, 261,209 transcripts for news articles from ABC, CNN,

NPR and MSNBC in the period from 1992 to 1998. Document Structure: each document (story) is

composed of several fields, such as Title, Topic, Keywords, Date, Abstract and Body.

(Training) topic labels provided by PSM (4 topics) Airplane accidents, bombings, tornados, hijackings

CMU students labeled 36 events within 4 topics (divided into 50% training and 50% test)

Page 16: CMU TDT Report     TIDES PI Meeting 2002

Results for Topic-Conditioned FSD

Page 17: CMU TDT Report     TIDES PI Meeting 2002

Confusability Reduction (5 events within topic: airplane accident in test data)

NOTE:1. These graphs only contains test data (5 events for topic “airplane accidents”)2. The left graph is the Baseline, and the right one is the Ideal Case.

Page 18: CMU TDT Report     TIDES PI Meeting 2002

Topic-Conditioned Approach to First Story Detection for TDT