hypothesis transformation and semantic variability rules used in rte
TRANSCRIPT
Hypothesis Transformation and Semantic Variability Rules Used in
RTE
Adrian Iftene, Alexandra [email protected],[email protected]
„„Al. I. Cuza“ University, Iasi, Al. I. Cuza“ University, Iasi, RomaniaRomania
Faculty of Computer ScienceFaculty of Computer Science
Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
Overview
System presentation Tools Resources Semantic variability rules Fitness calculation Results Peer-to-Peer architecture Conclusions and Future Work
Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
System presentation
Resources
Initial
data
DIRT
Minipar module
Dependency trees for (T, H) pairs
LingPipe module
Named entities for (T, H) pairs
Final result
Core Module3
Core Module2
Core Module1
Acronyms
Background knowledge
Wordnet
P2P Computers
Wikipedia
Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
Tools - LingPipe LingPipe (http://www.alias-i.com/lingpipe) is a suite
of Java libraries for the linguistic analysis of human language. The major tools are for: Sentence Parts of Speech. Named Entities. Coreference
Example: Hypothesis from pair 111:Leloir was born in Argentina.
<ENAMEX TYPE="PERSON">Leloir</ENAMEX> was born in <ENAMEX TYPE="LOCATION">Argentina</ENAMEX>.
Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
Tools - MINIPAR MINIPAR (Lin, 1998) transform the text and the
hypothesis into dependency treesExample: Le Beau Serge was directed by Chabrol.
(E0(() fin C * )1 (Le ~ U 3 lex-mod (gov Le Beau Serge))2 (Beau ~ U 3 lex-mod (gov Le Beau Serge))3 (Serge Le Beau Serge N 5 s (gov direct))4 (was be be 5 be (gov direct))5 (directed direct V E0 i (gov fin))E2 (() Le Beau Serge N 5 obj (gov direct)
(antecedent 3))6 (by ~ Prep 5 by-subj (gov direct))7 (Chabrol ~ N 6 pcomp-n (gov by))8 (. ~ U * punc))
direct (V)
Le_Beau_Serge (N) be (be) Chabrol (N)
Le_Beau_Serge (N)
Le (U) Beau (U)
sbe by
obj
lex-modlex-mod
Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
Resources
DIRT - Discovery of Inference Rules from Text
Extended WordNetAcronymsBackground Knowledge
Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
Resources – DIRT DIRT is both an algorithm and a resulting
knowledge collection (Lin and Pantel, 2001)
"X solves Y" Y is solved by XX resolves YX finds a solution to YX tries to solve YX deals with YY is resolved by X…
N:s:V<direct>V:by:NN:obj:V<direct>V:by:N
N:s:V<direct>V::V<direct>V:by:N:V<direct>V:by:NN:obj:V<direct>V:
Example: Le Beau Serge was directed by Chabrol
Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
Resources – DIRT (cont...)
HypothesisVerb
relation1 relation2
TextVerbrelation1 relation3
LeftSubtree
RightSubtree
RightSubtree
LeftSubtree
Pair 37:T: She was transferred again to Navy when the American Civil War began, 1861.H: The American Civil War started in 1861.
H’: The American Civil War began in 1861.
Left – left relations similarity
Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
Resources – DIRT (cont...)
HypothesisVerb
relation1 relation2
TextVerbrelation3 relation1
LeftSubtree
RightSubtree
LeftSubtree
RightSubtree
Pair 161:T: The demonstrators, convoked by the solidarity with Latin America committee, verbally attacked Salvadoran President Alfredo Cristiani.H: President Alfredo Cristiani was attacked by demonstrators.
H’: Demonstrators attacked President Alfredo Cristiani.
Left – right relations similarity
Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
Resources – eXtended WordNet
For every synonym, we check to see which word appears in the text tree, and select the mapping with the best value according to the values from eXtended WordNet (http://xwn.hlt.utdallas.edu/downloads.html)
For example, the relation between “relative” and “niece” is made with a score of 0.078652.
Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
Resources - Acronyms
The acronyms’ database (http://www.acronym-guide.com) helps our program in finding relations between the acronym and its meaning: “US - United States”
Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
Resources – Background Knowledge
Argentine [is] Argentina
Chinese [in] China
Los Angeles [in] California
2 [is] two
Netherlands [is] Holland
ar |calling_code = 54 |footnotes = Argentina also has a territorial disputeArgentina', , Nación Argentina (Argentine Nation) for many
legal purposes), isin the world. Argentina occupies a continental surface area ofArgentina national football team
Netherlands [is] Dutch Netherlands [is] NederlandseNetherlands [is] AntillenNetherlands [in] EuropeNetherlands [is] HollandAntilles [in] Netherlands
“Argentine”: Extracted Snippets from Wikipedia:
Usually are “definition” patterns:- verbs like “is”, “define”, “represent”, etc.- punctuation context , “ ‘ () [] :- anaphora resolution
Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
Semantic Variability Rules Negation rule – given by terms like “no”, “not”,
“never” Modal verbs: “may”, “might”, “cannot”, “should”,
“could” Certain cases for particle “to” when it precedes:
a verb: “allow”, “impose”, “galvanize” adjective like “necessary”, “compulsory”, “free” noun like “attempt”, “trial”
Influence of context: Positive words: “certainly”, “absolutely” Negative words: “probably”, “likely”
Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
Fitness calculation 1 Local Fitness:
1 at direct mapping, Acronyms, BK
DIRT score eXtended WordNet score
Extended Local Fitness: Local Fitness Parent Fitness Mapping of edge label Node Position (left or
right)
Text tree
node mapping
father mapping
edge label mapping
Hypothesis tree
Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
Fitness calculation 2
Total Fitness
The Negation Value
Threshold value = 2.06
rN odesN um beH ypo thesis
ca lF itnessE x tendedLoTF Hnode
node
rO fV erbsTo ta lN um berV erbsN um bePositive
NV_
)4(*)1(* TFNVTFNVessG loba lF itn
Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
Fitness calculation 3 T: The French railway company SNCF is cooperating in the project. H: The French railway company is called SNCF.
2.6251(SNCF, call, desc)
1.1251(company, call, obj)
3.0480.096(call, -, -)
41(be, call, be)
2.51(company, call, s)
3.1251(railway, company, nn)
3.1251(French, company, nn)
3.1251(the, company, det)
Extended local fitness
Node Fitness
Initial entity
•Total_Fitness = (3.125 + 3.125 + 3.125 + 2.5 + 4 + 3.048 + 1.125 + 2.625)/8 Total_Fitness = (3.125 + 3.125 + 3.125 + 2.5 + 4 + 3.048 + 1.125 + 2.625)/8 = 22.673/8 = 2.834= 22.673/8 = 2.834•Positive_Verbs_Number = 1/1 = 1Positive_Verbs_Number = 1/1 = 1•GlobalFitness = 1*2.834+(1–1)*(4-2.834) = 2.834GlobalFitness = 1*2.834+(1–1)*(4-2.834) = 2.834
Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
Results 1
0.69130.6450.8650.6850.57Run02
0.69130.6350.870.690.57Run01
GlobalSUMQAIRIE
16.71 %0.5758Without NEs
2.17 %0.6763Without SVR
2.00 %0.6775Without BK
1.08 %0.6838Without Acronyms
1.63 %0.6800Without WordNet
0.54 %0.6876Without DIRT
RelevancePrecisionSystem Description
Component relevance:
Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
Result 2 Pilot task: Yes, No/Unknown + answer justification
0.6430.4370.4750.471System 2
0.8050.5470.5950.569System 1
RecallPrecisionF(b=1/3)Accuracy
0.7530.731max
0.4750.471median
0.2110.365min
F(beta=1/3)Accuracy
Table over 12 submitted runs:
Mean [understandability correctness]:[4.1 2.0] [4.3 2.8]* [4.1 1.5] [2.7 1.2] [3.2 1.5] [3.1 1.5]
Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
Peer-to-Peer Architecture
Initiator
DIRT db
CM
CM
CM
CM
Acronyms
SMB upload
SMB download
CM
CM
•Speed optimization•P2P architecture, cache mechanism
•Ending synchronization•Quota mechanism
Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
Results
0:00:06.75 computers with 7 processes4
0:00:41One computer with full cache at start3
2:03:13One computer with caching mechanism, but with empty cache at start
2
5:28:45One computer without caching mechanism
1
DurationRun detailsNo
Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
Conclusions
Core of our approach is based on a tree edit distance algorithm (Kouylekov, Magnini, 2005)
Main idea is to transform the hypothesis using source like DIRT, WordNet, Wikipedia, Acronyms database
Additionally, we built a system to acquire the extra background knowledge and applied complex grammar rules for rephrasing in English
At each step, analysis of the influence of resources used and new subproblems identified and addressed
Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
Future work
Search for a method to establish more precise values for penalties
The multiplication coefficients for the parameters in the extended local fitness
Using machine learning to establish the global threshold
Inserting the Textual Entailment system as part of a Question Answering system
Building a Romanian Textual Entailment System
Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
Acknowledgments
Pre-processing: Daniel Matei NLP group of Iasi:
Coordinator: Prof. Dan Cristea Diana Trandabat, Corina Forascu,Ionut
Pistol, Marius Raschip Anaphora resolution group: Iustin
Dornescu, Alex Moruz, Gabriela Pavel
Adrian Iftene&Alexandra Balahur-Dobrescu – “Al.I.Cuza” University of Iasi, Romania
THANK YOU!