discovering event evolution graphs from newswires christopher c. yang and xiaodong shi event...

1
DISCOVERING EVENT EVOLUTION GRAPHS FROM NEWSWIRES Christopher C. Yang and Xiaodong Shi Event Evolution and Event Evolution Graph: We define event evolution as the directional dependencies or relatedness, which exhibit the track of event development, between two events inside a same news affair; the relationship between such two events is called event evolution relationship. • An event evolution graph is defined as a directed acyclic graph (DAG) G = (V, L) consisting of events as the nodes V = {ε 1 , ε 2 , …, ε n } and event evolution relationships as the directed edges L = {(ε i , ε j )} between nodes . A partial event evolution graph for the news topic “Beslan school hostage crisis”. The numbers in the bracket indicates their temporal orderings. There are totally 10 events and 11 event evolution relationships in it. Overview : In Topic Detection and Tracking (TDT) news stories are often organized into a flat hierarchical structure where inter-cluster relationships are missing. Modeling the event evolutions of news topics and presenting them in a graph structure can be useful in various applications: Direct users through the news topic in information browsing. Integrate with automatic summarization techniques and graphical interfaces to provide a graphical web news infomediary. We propose to represent the event evolutions of news topic using an event evolution graph, which is a directed graph with its vertices as events and its edges as event evolution relationships. Modeling Event Evolution Relationships: Previous researches, e.g. event threading, uses the average pairwise story similarity as the measurement of event evolutions. Event threading neglects the properties of events and simply treats event as an aggregate set of news stories. We propose to measure the event content similarity between two events by calculating the cosine similarity of the event term vectors, which is then combined with two decaying factors, temporal proximity and document distributional proximity, to measure the confidences of event evolution relationships. Event Content Similarity: We use the simple bag of words model to represent the textual content of each news story. The event term vector of event ε i is computed as the average of the document term vectors of stories that belong to ε i . TF weights are used instead of traditional TF-IDF weights. The event content similarity between events ε and ε j is: where etv(.) is the event term vector representation of the set of stories belonging to the same event . Experiments: Event threading model combined with Nearest Parent or Best Similarity graph model is selected as the baseline. When event evolution model is combined with static thresholding, it outperforms the rival models a lot. (α=0.5, β=0.5) The Precision and Recall Curves (Interpolated to Standard 11 Levels) of the Comparative Experimental Results The negotiation talk w ith terroristsbroke dow n (2) Russia approached to identify the suspects ofBeslan tragic (7) Russiansconducted investigation into Beslan tragic (9) Russiansclaim ed to strike Chechen terrorism (10) Reactionsand responses on Beslan schoolhostage tragic (6) Beslan schoolresum ed classafterthe hostage tragic (11) Terroristsseized the Beslan school w ith hostages (1) 26 hostagesofw om en and infantsw ere freed butm ost hostagesw ere stillheld (3) Specialtask force assaulted terroristsand hundredsof hostagesdied (5) Russiansrally against terrorism (8) = > 0 (, ) (, ) (, ) (, ) i j i j i j i j i j i j i ifi j or s s j and s s conf cs tp df i j (, )= ( ( ), ( )) i j cs cosine_ sim etv S etv S (, ) m N i j df e where m is the number of documents that belong to the events happening in-between event ε i and ε j . N is the total number of documents in the topic. β is a decaying factor. Document Distributional Proximity: The proximity of news stories in their distributions is more useful for measuring event evolution than temporal proximity in cases like when there is a burst of events and stories. We define the document distributional proximity as: 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall Precision EventEvolutionGraph+StaticThresholding EventThreading+StaticThresholding EventThreading+BestSimilarity EventThreading+NearestParent EventEvolutionGraph+BestSimilarity EventEvolutionGraph+NearestParent Static Thresholding: To prune generated event evolution graphs, we compute the confidence of all event evolution relationships and filter away undesirable ones according to the static thresholding model described below: G’ = (V, L’) where, (, and ) ' (, )| (, ) i j i j i j V i j L conf Flat hierarchical structure of news topics in TDT Event evolution graph representation of news topics Temporal Proximity: Assume the timestamp of an event ε i is a timeinterval [s i , e i ], the temporal distance between two events ε i and ε j as (s i s ): ( ) (, ) ( > ) j i i j i j i j s e if e s dt t 0 if e s Intuitively if two events are farther away from each other along the timeline, the event evolution between them is less likely to exist. The temporal proximity between two events is: (s i s j ): , (, ) i j dtt T i j tp e where T is the event horizon defined as the time-span of the entire news affair. α is the time decaying weight (0≤ α ≤1). Event1 Event2 Event3 Event4 Event5

Upload: abigail-baldwin

Post on 27-Mar-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DISCOVERING EVENT EVOLUTION GRAPHS FROM NEWSWIRES Christopher C. Yang and Xiaodong Shi Event Evolution and Event Evolution Graph: We define event evolution

DISCOVERING EVENT EVOLUTION GRAPHS FROM NEWSWIRESChristopher C. Yang and Xiaodong Shi

• Event Evolution and Event Evolution Graph:• We define event evolution as the directional dependencies or relatedness,

which exhibit the track of event development, between two events inside a same news affair; the relationship between such two events is called event evolution relationship.

• An event evolution graph is defined as a directed acyclic graph (DAG) G = (V, L) consisting of events as the nodes V = {ε1, ε2, …, εn} and event evolution relationships as the directed edges L = {(εi, εj)} between nodes .

A partial event evolution graph for the news topic “Beslan school hostage crisis”. The numbers in the bracket indicates their temporal orderings. There are totally 10 events and 11 event evolution relationships in it.

• Overview:• In Topic Detection and Tracking (TDT) news stories are often organized into a

flat hierarchical structure where inter-cluster relationships are missing.• Modeling the event evolutions of news topics and presenting them in a graph

structure can be useful in various applications:• Direct users through the news topic in information browsing. • Integrate with automatic summarization techniques and graphical interfaces

to provide a graphical web news infomediary.• We propose to represent the event evolutions of news topic using an event

evolution graph, which is a directed graph with its vertices as events and its edges as event evolution relationships.

• Modeling Event Evolution Relationships:• Previous researches, e.g. event threading, uses the average pairwise story

similarity as the measurement of event evolutions. • Event threading neglects the properties of events and simply treats event as an

aggregate set of news stories. • We propose to measure the event content similarity between two events by

calculating the cosine similarity of the event term vectors, which is then combined with two decaying factors, temporal proximity and document distributional proximity, to measure the confidences of event evolution relationships.

• Event Content Similarity:• We use the simple bag of words model to represent the textual content of

each news story.• The event term vector of event εi is computed as the average of the

document term vectors of stories that belong to εi. TF weights are used instead of traditional TF-IDF weights.

• The event content similarity between events εi and εj is:

where etv(.) is the event term vector representation of the set of stories belonging to the same event .

• Experiments:• Event threading model combined with Nearest Parent or Best Similarity

graph model is selected as the baseline. • When event evolution model is combined with static thresholding, it

outperforms the rival models a lot. (α=0.5, β=0.5)

The Precision and Recall Curves (Interpolated to Standard 11 Levels) of the Comparative Experimental Results

The negotiation talk with terrorists broke down (2)

Russia approached to identify the suspects of Beslan tragic (7)

Russians conducted investigation into Beslan tragic (9)

Russians claimed to strike Chechen terrorism (10)

Reactions and responses on Beslan school hostage tragic (6)

Beslan school resumed class after the hostage tragic (11)

Terrorists seized the Beslan school with hostages (1)

26 hostages of women and infants were freed but most hostages were still held (3)

Special task force assaulted terrorists and hundreds of hostages died (5)

Russians rally against terrorism (8)

= >

0 ( , )

( , ) ( , ) ( , ) i j

i ji j i j i j i ji

if i j or s s

f i j and s sconf

cs tp df

i j( , ) = ( ( ), ( ))i jcs cosine_sim etv S etv S

( , )m

Ni jdf e

where m is the number of documents that belong to the events happening in-between event εi and εj. N is the total number of documents in the topic. β is a decaying factor.

• Document Distributional Proximity:• The proximity of news stories in their distributions is more useful for

measuring event evolution than temporal proximity in cases like when there is a burst of events and stories.

• We define the document distributional proximity as:

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Recall

Pre

cisi

on

EventEvolutionGraph+StaticThresholdingEventThreading+StaticThresholdingEventThreading+BestSimilarityEventThreading+NearestParentEventEvolutionGraph+BestSimilarityEventEvolutionGraph+NearestParent

• Static Thresholding:• To prune generated event evolution graphs, we compute the confidence

of all event evolution relationships and filter away undesirable ones according to the static thresholding model described below:

G’ = (V, L’)where,

( , and )' ( , ) | ( , ) i ji j i j V i jL conf

Flat hierarchical structure of news topics in TDT

Event evolution graph representation of news topics

• Temporal Proximity:• Assume the timestamp of an event εi is a timeinterval [si, ei], the temporal

distance between two events εi and εj as (si ≤ sj):

( ) ( , ) ( > )

j i i ji j i j

s e if e sd t t 0 if e s

• Intuitively if two events are farther away from each other along the timeline, the event evolution between them is less likely to exist.

• The temporal proximity between two events is: (si ≤ sj):

,

( , )

i jd t t

T

i jtp e

where T is the event horizon defined as the time-span of the entire news affair. α is the time decaying weight (0≤ α ≤1).

Event 1

Event 2

Event 3

Event 4

Event 5