hypothesis transformation and semantic variability rules used in rte

24
Hypothesis Transformation and Semantic Variability Rules Used in RTE Adrian Iftene, Alexandra Balahur-Dobrescu [email protected],[email protected] Al. I. Cuza“ University, Iasi, Al. I. Cuza“ University, Iasi, Romania Romania Faculty of Computer Science Faculty of Computer Science

Upload: faculty-of-computer-science

Post on 25-Jun-2015

818 views

Category:

Spiritual


0 download

TRANSCRIPT

Page 1: Hypothesis Transformation and Semantic Variability Rules Used in RTE

Hypothesis Transformation and Semantic Variability Rules Used in

RTE

Adrian Iftene, Alexandra [email protected],[email protected]

„„Al. I. Cuza“ University, Iasi, Al. I. Cuza“ University, Iasi, RomaniaRomania

Faculty of Computer ScienceFaculty of Computer Science

Page 2: Hypothesis Transformation and Semantic Variability Rules Used in RTE

Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania

Overview

System presentation Tools Resources Semantic variability rules Fitness calculation Results Peer-to-Peer architecture Conclusions and Future Work

Page 3: Hypothesis Transformation and Semantic Variability Rules Used in RTE

Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania

System presentation

Resources

Initial

data

DIRT

Minipar module

Dependency trees for (T, H) pairs

LingPipe module

Named entities for (T, H) pairs

Final result

Core Module3

Core Module2

Core Module1

Acronyms

Background knowledge

Wordnet

P2P Computers

Wikipedia

Page 4: Hypothesis Transformation and Semantic Variability Rules Used in RTE

Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania

Tools - LingPipe LingPipe (http://www.alias-i.com/lingpipe) is a suite

of Java libraries for the linguistic analysis of human language. The major tools are for: Sentence Parts of Speech. Named Entities. Coreference

Example: Hypothesis from pair 111:Leloir was born in Argentina.

<ENAMEX TYPE="PERSON">Leloir</ENAMEX> was born in <ENAMEX TYPE="LOCATION">Argentina</ENAMEX>.

Page 5: Hypothesis Transformation and Semantic Variability Rules Used in RTE

Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania

Tools - MINIPAR MINIPAR (Lin, 1998) transform the text and the

hypothesis into dependency treesExample: Le Beau Serge was directed by Chabrol.

(E0(() fin C * )1 (Le ~ U 3 lex-mod (gov Le Beau Serge))2 (Beau ~ U 3 lex-mod (gov Le Beau Serge))3 (Serge Le Beau Serge N 5 s (gov direct))4 (was be be 5 be (gov direct))5 (directed direct V E0 i (gov fin))E2 (() Le Beau Serge N 5 obj (gov direct)

(antecedent 3))6 (by ~ Prep 5 by-subj (gov direct))7 (Chabrol ~ N 6 pcomp-n (gov by))8 (. ~ U * punc))

direct (V)

Le_Beau_Serge (N) be (be) Chabrol (N)

Le_Beau_Serge (N)

Le (U) Beau (U)

sbe by

obj

lex-modlex-mod

Page 6: Hypothesis Transformation and Semantic Variability Rules Used in RTE

Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania

Resources

DIRT - Discovery of Inference Rules from Text

Extended WordNetAcronymsBackground Knowledge

Page 7: Hypothesis Transformation and Semantic Variability Rules Used in RTE

Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania

Resources – DIRT DIRT is both an algorithm and a resulting

knowledge collection (Lin and Pantel, 2001)

"X solves Y" Y is solved by XX resolves YX finds a solution to YX tries to solve YX deals with YY is resolved by X…

N:s:V<direct>V:by:NN:obj:V<direct>V:by:N

N:s:V<direct>V::V<direct>V:by:N:V<direct>V:by:NN:obj:V<direct>V:

Example: Le Beau Serge was directed by Chabrol

Page 8: Hypothesis Transformation and Semantic Variability Rules Used in RTE

Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania

Resources – DIRT (cont...)

HypothesisVerb

relation1 relation2

TextVerbrelation1 relation3

LeftSubtree

RightSubtree

RightSubtree

LeftSubtree

Pair 37:T: She was transferred again to Navy when the American Civil War began, 1861.H: The American Civil War started in 1861.

H’: The American Civil War began in 1861.

Left – left relations similarity

Page 9: Hypothesis Transformation and Semantic Variability Rules Used in RTE

Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania

Resources – DIRT (cont...)

HypothesisVerb

relation1 relation2

TextVerbrelation3 relation1

LeftSubtree

RightSubtree

LeftSubtree

RightSubtree

Pair 161:T: The demonstrators, convoked by the solidarity with Latin America committee, verbally attacked Salvadoran President Alfredo Cristiani.H: President Alfredo Cristiani was attacked by demonstrators.

H’: Demonstrators attacked President Alfredo Cristiani.

Left – right relations similarity

Page 10: Hypothesis Transformation and Semantic Variability Rules Used in RTE

Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania

Resources – eXtended WordNet

For every synonym, we check to see which word appears in the text tree, and select the mapping with the best value according to the values from eXtended WordNet (http://xwn.hlt.utdallas.edu/downloads.html)

For example, the relation between “relative” and “niece” is made with a score of 0.078652.

Page 11: Hypothesis Transformation and Semantic Variability Rules Used in RTE

Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania

Resources - Acronyms

The acronyms’ database (http://www.acronym-guide.com) helps our program in finding relations between the acronym and its meaning: “US - United States”

Page 12: Hypothesis Transformation and Semantic Variability Rules Used in RTE

Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania

Resources – Background Knowledge

Argentine [is] Argentina

Chinese [in] China

Los Angeles [in] California

2 [is] two

Netherlands [is] Holland

ar |calling_code = 54 |footnotes = Argentina also has a territorial disputeArgentina', , Nación Argentina (Argentine Nation) for many

legal purposes), isin the world. Argentina occupies a continental surface area ofArgentina national football team

Netherlands [is] Dutch Netherlands [is] NederlandseNetherlands [is] AntillenNetherlands [in] EuropeNetherlands [is] HollandAntilles [in] Netherlands

“Argentine”: Extracted Snippets from Wikipedia:

Usually are “definition” patterns:- verbs like “is”, “define”, “represent”, etc.- punctuation context , “ ‘ () [] :- anaphora resolution

Page 13: Hypothesis Transformation and Semantic Variability Rules Used in RTE

Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania

Semantic Variability Rules Negation rule – given by terms like “no”, “not”,

“never” Modal verbs: “may”, “might”, “cannot”, “should”,

“could” Certain cases for particle “to” when it precedes:

a verb: “allow”, “impose”, “galvanize” adjective like “necessary”, “compulsory”, “free” noun like “attempt”, “trial”

Influence of context: Positive words: “certainly”, “absolutely” Negative words: “probably”, “likely”

Page 14: Hypothesis Transformation and Semantic Variability Rules Used in RTE

Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania

Fitness calculation 1 Local Fitness:

1 at direct mapping, Acronyms, BK

DIRT score eXtended WordNet score

Extended Local Fitness: Local Fitness Parent Fitness Mapping of edge label Node Position (left or

right)

Text tree

node mapping

father mapping

edge label mapping

Hypothesis tree

Page 15: Hypothesis Transformation and Semantic Variability Rules Used in RTE

Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania

Fitness calculation 2

Total Fitness

The Negation Value

Threshold value = 2.06

rN odesN um beH ypo thesis

ca lF itnessE x tendedLoTF Hnode

node

rO fV erbsTo ta lN um berV erbsN um bePositive

NV_

)4(*)1(* TFNVTFNVessG loba lF itn

Page 16: Hypothesis Transformation and Semantic Variability Rules Used in RTE

Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania

Fitness calculation 3 T: The French railway company SNCF is cooperating in the project. H: The French railway company is called SNCF.

2.6251(SNCF, call, desc)

1.1251(company, call, obj)

3.0480.096(call, -, -)

41(be, call, be)

2.51(company, call, s)

3.1251(railway, company, nn)

3.1251(French, company, nn)

3.1251(the, company, det)

Extended local fitness

Node Fitness

Initial entity

•Total_Fitness = (3.125 + 3.125 + 3.125 + 2.5 + 4 + 3.048 + 1.125 + 2.625)/8 Total_Fitness = (3.125 + 3.125 + 3.125 + 2.5 + 4 + 3.048 + 1.125 + 2.625)/8 = 22.673/8 = 2.834= 22.673/8 = 2.834•Positive_Verbs_Number = 1/1 = 1Positive_Verbs_Number = 1/1 = 1•GlobalFitness = 1*2.834+(1–1)*(4-2.834) = 2.834GlobalFitness = 1*2.834+(1–1)*(4-2.834) = 2.834

Page 17: Hypothesis Transformation and Semantic Variability Rules Used in RTE

Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania

Results 1

0.69130.6450.8650.6850.57Run02

0.69130.6350.870.690.57Run01

GlobalSUMQAIRIE

16.71 %0.5758Without NEs

2.17 %0.6763Without SVR

2.00 %0.6775Without BK

1.08 %0.6838Without Acronyms

1.63 %0.6800Without WordNet

0.54 %0.6876Without DIRT

RelevancePrecisionSystem Description

Component relevance:

Page 18: Hypothesis Transformation and Semantic Variability Rules Used in RTE

Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania

Result 2 Pilot task: Yes, No/Unknown + answer justification

0.6430.4370.4750.471System 2

0.8050.5470.5950.569System 1

RecallPrecisionF(b=1/3)Accuracy

0.7530.731max

0.4750.471median

0.2110.365min

F(beta=1/3)Accuracy

Table over 12 submitted runs:

Mean [understandability correctness]:[4.1 2.0] [4.3 2.8]* [4.1 1.5] [2.7 1.2] [3.2 1.5] [3.1 1.5]

Page 19: Hypothesis Transformation and Semantic Variability Rules Used in RTE

Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania

Peer-to-Peer Architecture

Initiator

DIRT db

CM

CM

CM

CM

Acronyms

SMB upload

SMB download

CM

CM

•Speed optimization•P2P architecture, cache mechanism

•Ending synchronization•Quota mechanism

Page 20: Hypothesis Transformation and Semantic Variability Rules Used in RTE

Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania

Results

0:00:06.75 computers with 7 processes4

0:00:41One computer with full cache at start3

2:03:13One computer with caching mechanism, but with empty cache at start

2

5:28:45One computer without caching mechanism

1

DurationRun detailsNo

Page 21: Hypothesis Transformation and Semantic Variability Rules Used in RTE

Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania

Conclusions

Core of our approach is based on a tree edit distance algorithm (Kouylekov, Magnini, 2005)

Main idea is to transform the hypothesis using source like DIRT, WordNet, Wikipedia, Acronyms database

Additionally, we built a system to acquire the extra background knowledge and applied complex grammar rules for rephrasing in English

At each step, analysis of the influence of resources used and new subproblems identified and addressed

Page 22: Hypothesis Transformation and Semantic Variability Rules Used in RTE

Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania

Future work

Search for a method to establish more precise values for penalties

The multiplication coefficients for the parameters in the extended local fitness

Using machine learning to establish the global threshold

Inserting the Textual Entailment system as part of a Question Answering system

Building a Romanian Textual Entailment System

Page 23: Hypothesis Transformation and Semantic Variability Rules Used in RTE

Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania

Acknowledgments

Pre-processing: Daniel Matei NLP group of Iasi:

Coordinator: Prof. Dan Cristea Diana Trandabat, Corina Forascu,Ionut

Pistol, Marius Raschip Anaphora resolution group: Iustin

Dornescu, Alex Moruz, Gabriela Pavel

Page 24: Hypothesis Transformation and Semantic Variability Rules Used in RTE

Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania

THANK YOU!