trips and trios system for tempeval-2: extracting temporal in- formation from text

33
TRIPS and TRIOS System for TempEval-2: Extracting Temporal Information from Text Naushad UzZaman and James F. Allen Computer Science Department University of Rochester Rochester, NY, USA International Workshop on Semantic Evaluations (SemEval-2010), Association for Computational Linguistics (ACL), Sweden, July 2010.

Upload: naushad-uzzaman

Post on 08-Mar-2016

214 views

Category:

Documents


1 download

DESCRIPTION

Extracting temporal information from raw text is fundamental for deep language understand- ing, and key to many applications like ques- tion answering, information extraction, and document summarization. In this paper, we describe two systems we submitted to the TempEval 2 challenge, for extracting temporal information from raw text. The systems use a combination of deep semantic parsing, Markov Logic Networks and Conditional Random Field classifiers. Our two submitted systems, TRIPS and TRIOS, approached all tasks and outperformed all teams in two tasks. Furthermore, TRIOS mostly had second-best performances in other tasks. TRIOS also out- performed the other teams that attempted all the tasks. Our system is notable in that for tasks C – F, they operated on raw text while all other systems used tagged events and tem- poral expressions in the corpus as input.

TRANSCRIPT

TRIPS and TRIOS System for TempEval-2: Extracting Temporal Information from Text

Naushad UzZaman and James F. AllenComputer Science Department

University of RochesterRochester, NY, USA

International Workshop on Semantic Evaluations (SemEval-2010), Association for Computational

Linguistics (ACL), Sweden, July 2010.

Naushad UzZaman July 16, 2010

Task B: Event extraction

The New York Times said in an editorial on Saturday, April 25: The Supreme Court took a detour this week from the core principle of gender fairness it vindicated two years ago in its ruling invalidating the use of sexual stereotypes to justify denying women admission to the Virginia Military Institute.

By a 6-3 vote, the court upheld a discriminatory immigration law that gives a child born overseas to an unmarried American woman a better chance at citizenship than a child born to an unmarried American man.

Naushad UzZaman July 16, 2010

Task A: Temporal Expression extraction

The New York Times said in an editorial on Saturday, April 25: The Supreme Court took a detour this week from the core principle of gender fairness it vindicated two years ago in its ruling invalidating the use of sexual stereotypes to justify denying women admission to the Virginia Military Institute.

By a 6-3 vote, the court upheld a discriminatory immigration law that gives a child born overseas to an unmarried American woman a better chance at citizenship than a child born to an unmarried American man.

Naushad UzZaman July 16, 2010

Tasks C-F: Temporal relations extraction

The New York Times said in an editorial on Saturday, April 25: The Supreme Court took a detour this week from the core principle of gender fairness it vindicated two years ago in its ruling invalidating the use of sexual stereotypes to justify denying women admission to the Virginia Military Institute.

By a 6-3 vote, the court upheld a discriminatory immigration law that gives a child born overseas to an unmarried American woman a better chance at citizenship than a child born to an unmarried American man.

Naushad UzZaman July 16, 2010

Our Systems

• TRIPS: based on TRIPS Parser

• TRIOS: hybrid between TRIPS Parser and machine learning classifiers (MLN, CRF)

Naushad UzZaman July 16, 2010

Outline

• Our System Modules

• TRIPS Parser

• Markov Logic Network

• Task B: Event extraction

• Task A: Temporal Expression extraction

• Tasks C-F: Identify temporal relations

• Future Work and Summary

Naushad UzZaman July 16, 2010

Outline

• Our System Modules

• TRIPS Parser

• Markov Logic Network

• Task B: Event extraction

• Task A: Temporal Expression extraction

• Tasks C-F: Identify temporal relations

• Future Work and Summary

Naushad UzZaman July 16, 2010

TRIPS Parser: Broad coverage deep parsing

named entity recognizers

address recognizer

statisticalparser

InputChart

name hypotheses

address hypotheses

BracketingPreferences

semantic preferences

LF formpreferences

Wordnet Wordfinder

new lexical entries

Core Lexicon & LF Ontology

Grammar

Parser

ContentExtractor

FinalLogical Form

Input

OutputChart

word hypotheses

Comlex

POStagging

POShyps

Advanced Medical paid $ 106 million in cash for its share in a unit of Henley 's Fisher Scientific subsidiary .

preprocessing

Lexicon on demand

Search guidance

heuristicextraction

Naushad UzZaman July 16, 2010

TRIPS Parser: Broad coverage deep parsing

named entity recognizers

address recognizer

statisticalparser

InputChart

name hypotheses

address hypotheses

BracketingPreferences

semantic preferences

LF formpreferences

Wordnet Wordfinder

new lexical entries

Core Lexicon & LF Ontology

Grammar

Parser

ContentExtractor

FinalLogical Form

Input

OutputChart

word hypotheses

Comlex

POStagging

POShyps

preprocessing

Lexicon on demand

Search guidance

heuristicextraction

Advanced Medical paid $ 106 million in cash for its share in a unit of Henley 's Fisher Scientific subsidiary .

Naushad UzZaman July 16, 2010

TRIPS Parser: Broad coverage deep parsing

named entity recognizers

address recognizer

statisticalparser

InputChart

name hypotheses

address hypotheses

BracketingPreferences

semantic preferences

LF formpreferences

Wordnet Wordfinder

new lexical entries

Core Lexicon & LF Ontology

Grammar

Parser

ContentExtractor

FinalLogical Form

Input

OutputChart

word hypotheses

Comlex

POStagging

POShyps

preprocessing

Lexicon on demand

Search guidance

heuristicextraction

Advanced Medical paid $ 106 million in cash for its share in a unit of Henley 's Fisher Scientific subsidiary . subsidiary

Naushad UzZaman July 16, 2010

TRIPS Parser: Broad coverage deep parsing

named entity recognizers

address recognizer

statisticalparser

InputChart

name hypotheses

address hypotheses

BracketingPreferences

semantic preferences

LF formpreferences

Wordnet Wordfinder

new lexical entries

Core Lexicon & LF Ontology

Grammar

Parser

ContentExtractor

FinalLogical Form

Input

OutputChart

word hypotheses

Comlex

POStagging

POShyps

preprocessing

Lexicon on demand

Search guidance

heuristicextraction

Advanced Medical paid $ 106 million in cash for its share in a unit of Henley 's Fisher Scientific subsidiary .

Naushad UzZaman July 16, 2010

Outline

• Our System Modules

• TRIPS Parser

• Markov Logic Network

• Task B: Event extraction

• Task A: Temporal Expression extraction

• Tasks C-F: Identify temporal relations

• Future Work and Summary

Naushad UzZaman May 21, 2010

Markov Logic Network

• Problems with rule based system and machine learning techniques

• Markov logic = first order logic + markov network (probabilistic graphical model)

• FOL with weights

• weights determine how much penalty for a formula to be violated

Example: It is not going to change

tense(e1, INFINITIVE) & aspect(e1, NONE) => class(e1, OCCURRENCE) weight = 0.319913

TheBeast: http://code.google.com/p/thebeast/

Naushad UzZaman July 16, 2010

Outline

• Our System Modules

• TRIPS Parser

• Markov Logic Network

• Task B: Event extraction

• Task A: Temporal Expression extraction

• Tasks C-F: Identify temporal relations

• Future Work and Summary

Naushad UzZaman July 16, 2010

He fought in the war

TRIPS parseroutput

Sentence

Extracted with extraction rules

100+ Extraction rules

(SPEECHACT V1 SA-TELL :CONTENT V2) (F V2 (:* FIGHTING FIGHT) :AGENT V3 :MODS (V4) :TMA ((TENSE PAST)))(PRO V3 (:* PERSON HE) :CONTEXT-REL HE) (F V4 (:* SITUATED-IN IN) :OF V2 :VAL V5) (THE V5 (:* ACTION WAR))

((THE ?x (? type SITUATION-ROOT)) -extract-noms>

(EVENT ?x (? type SITUATION-ROOT) :pos NOUN :class OCCURRENCE ))

<EVENT eid=V2 word=FIGHT pos=VERBAL ont-type=FIGHTING tense=PAST class=OCCURRENCE voice=ACTIVE aspect=NONE polarity=POSITIVE nf-morph=NONE>

<RLINK eventInstanceID=V2 ref-word=HE ref-ont-type=PERSON relType=AGENT><SLINK signal=IN eventInstanceID=V2 subordinatedEventInstance=V5

relType=SITUATED-IN><EVENT eid=V5 word=WAR pos=NOUN ont-type=ACTION class=OCCURRENCE

voice=ACTIVE polarity=POSITIVE aspect=NONE tense=NONE>

Events and event features extraction using TRIPS parser

Naushad UzZaman July 16, 2010

TRIPS and TRIOS System for event extraction (Task B)

• TRIPS event and event feature extraction

• Uses TRIPS parser

• High recall, lower precision

• Event features performance not the best

• TRIOS event and event feature extraction

• Take TRIPS events

• MLN for filtering TRIPS events

• MLNs to classify event features

Naushad UzZaman July 16, 2010

Event Extraction Performance

System Precision Recall Fscore

TRIOS 0.80 0.74 0.77

TRIPS 0.55 0.88 0.68

Best (TIPSem) 0.81 0.86 0.84

Table 1: Performance of Event Extraction (Task B) in TempEval-2

System TRIPS TRIOS Best

Class 0.67 0.77 0.79 (TIPSem)

Tense 0.67 0.91 0.92 (Edinburgh-LTG)

Aspect 0.97 0.98 0.98

Pos 0.88 0.96 0.97 (TIPSem, Edinburgh-LTG)

Polarity 0.99 0.99 0.99

Modality 0.95 0.95 0.99 (Edinburgh-LTG)

Table 1: Performance of Event Features on TempEval-2 (Task B)

Naushad UzZaman July 16, 2010

Outline

• Our System Modules

• TRIPS Parser

• Markov Logic Network

• Task B: Event extraction

• Task A: Temporal Expression extraction

• Tasks C-F: Identify temporal relations

• Future Work and Summary

Naushad UzZaman July 16, 2010

Temporal Expression Extraction (Task A)

• Recognizing Temporal Expression

• TRIPS parser

• Conditional Random Filed classifier (CRF++)

• word, shape, is year, is date of week, is month, is number, is time string, is day, is quarter, is punctuation, if belong to word-list like init-list , follow-list, etc.

• Determining the Normalized Value and Type

• Rule-based technique

• Matches regular expressions

• Released as open source

init-list is number is time string

last next

fiveten

months years

cw -2 cw - 1 current word

Temporal expression Type Value

DCT (given):

March 1, 1998; 14:11 hours TIME 1998-03-01T14:11:00

Sunday DATE 1998-03-01

last week DATE 1998-W08

mid afternoon TIME 1998-03-01TAF

nearly two years DURATION P2Y

each month SET P1M

Table 1: Examples of normalized values and types for temporal expressions

according to TimeML

Naushad UzZaman July 16, 2010

Performance on Temporal Expression Extraction

TRIPS BestTRIOS HeidelTime-1

Temp Exp Precision 0.85 0.90extraction Recall 0.85 0.82

Fscore 0.85 0.86Normalization type 0.94 0.96

value 0.76 0.85

Table 1: Performance on Temporal Expression extraction (Task A)

Naushad UzZaman July 16, 2010

Overview

Text

TRIPS parser

MLN event feature extractor

MLN event filtering

CRF timex extractor

timex normalizer

events event features timex timex

features

TaskB:TRIPS

Task BTRIOS

TaskB:TRIPS

Task BTRIOS Task A

Task A

Naushad UzZaman July 16, 2010

Outline

• Our System Modules

• TRIPS Parser

• Markov Logic Network

• Task B: Event extraction

• Task A: Temporal Expression extraction

• Tasks C-F: Identify temporal relations

• Future Work and Summary

Naushad UzZaman July 16, 2010

Identifying Temporal Relations (Tasks C-F)

• TempEval-2 Tasks:

• Task C: Temporal relation between event and temporal expression in the same sentence

• Task D: Temporal relation between event and document creation time

• Task E: Temporal relation between the main events of adjacent sentences

• Task F: Temporal relation between two events where one event syntactically dominates the other

• Our approach

• Based on MLN

• Features from TRIPS parser and TRIOS system

Naushad UzZaman July 16, 2010

Features Task C Task D Task E Task FEvent Class YES YES e1 x e2 e1 x e2

Event Tense YES YES e1 x e2 e1 x e2

Event Aspect YES YES e1 x e2 e1 x e2

Event Polarity YES YES e1 x e2 e1 x e2

Event Stem YES YES e1 x e2 e1 x e2

Event Word YES YES YES YESEvent Constituent YES e1 x e2 e1 x e2

Event Ont-type YES YES e1 x e2 e1 x e2

Event LexAspect x Tense YES YES e1 x e2 e1 x e2

Event Pos YES YES e1 x e2 e1 x e2

Timex Word YESTimex Type YES YESTimex Value YES YESTimex DCT relation YES YESEvent’s semantic role YES YESEvent’s argument’s ont-type YES YESTLINK event-time signal YES YESSLINK event-event relation type YES

Table 1: Features used for Tasks C, D, E and F.

Naushad UzZaman July 16, 2010

Performance on Task C-F

TRIPS TRIOS Best (with corpus features)

Task Precision Recall Precision Recall Precision

Task C 0.63 0.52 0.65 0.52 0.63 (JU-CSE, UCFD, NCSU-indi)

Task D 0.76 0.69 0.79 0.67 0.82 (TIPSem)

Task E 0.58 0.50 0.56 0.42 0.55 (TIPSem)

Task F 0.59 0.54 0.6 0.46 0.66 (NCSU-individual)

Table 1: Performance of Temporal Relations on TempEval-2 (Task C-F)

Naushad UzZaman July 16, 2010

Overall Performance in TempEval-2

Task Description BestTask A Temporal expression extraction TRIOS

Task B Event Extraction TIPSemTask C Event-Timex relationship TRIOS

Task D Event-DCT relationship TIPSemTask E Main event-event relationship TRIOS

Task F Subordinate event-event relationship TRIOS

Table 1: Head-to-head comparison of TRIOS, TIPSem and JU-CSE-TEMP(teams that approached all tasks) in TempEval-2 challenge

Naushad UzZaman July 16, 2010

Outline

• Our System Modules

• TRIPS Parser

• Markov Logic Network

• Task B: Event extraction

• Task A: Temporal Expression extraction

• Tasks C-F: Identify temporal relations

• Future Work and Summary

Naushad UzZaman July 16, 2010

Future Work

• Automatically build larger temporally annotated corpus:

• for news domain, with human reviews

• explore other domains

• Temporal Structure in Discourse

Naushad UzZaman July 16, 2010

Summary

• Approached all six tasks in TempEval 2010

• Systems: hybrid between semantic parser and ML classifiers

• TRIOS System: Performed better than any other systems approaching all tasks

• TRIPS System: Higher recall; could be used for automatic temporal annotation with human review

Naushad UzZaman July 16, 2010

Naushad UzZaman July 16, 2010

Benefits of TRIPS ontology

• Superior semantic ontology; better abstraction

• No problem with word sense disambiguation

• Considers semantic roles for disambiguation

• Helps to generate better links

Naushad UzZaman July 16, 2010

Table 1: Most common relTypes used in SLINKs and RLINKs

Our Role VerbNet Lirics SLINK RLINK

equivalents equivalents Count Count

Agent Agent, Agent 19 709

Actor

Theme Theme, Theme 336 1137

Stimulus

Affected Patient Patient 13 92

Cause Cause Cause 49

Goal-as-Loc Destination finalLocation 47

To-Loc Recipient Goal 46

At-Loc Location Location 42

In-Loc Location Location 28

On Location Location 20

Situated-In Location? Location? 39

Purpose – Purpose 226